Compare commits

...

46 Commits

Author SHA1 Message Date
Javier Goizueta
44511682bc Merge branch 'develop' into test-travis-fix 2017-11-03 15:12:29 +01:00
Andy Eschbacher
2dafc0f80e Merge branch 'master' into taylor_oshan-pysal_gwr 2017-11-02 15:38:54 -04:00
Andy Eschbacher
79b6ef59de Merge pull request #146 from TaylorOshan/pysal_gwr
geographically weighted regression (gwr)
2017-11-02 14:21:07 -04:00
Javier Goizueta
7a2a20acef Merge pull request #182 from Algunenano/pass_tests
Pass tests
2017-10-31 12:34:34 +01:00
Raul Marin
a146848a79 Travis: Use postgresql 9.5.2-3 2017-10-31 11:38:20 +01:00
Raul Marin
1dc93284f8 PG test: Make the extension tests version agnostic 2017-10-31 11:38:20 +01:00
Raul Marin
5b30783c04 PG regress: Order the tests
Avoids an issue where some tests were run before the setup
2017-10-31 11:38:20 +01:00
Raul Marin
bd2b190643 Add PIP and NOSETESTS as variables to Makefile.global
This makes it easier to change between pip/pip2 depending on the local environment
2017-10-31 11:38:20 +01:00
Raul Marin
e6c98e83db Travis: Set Ubuntu precise as distribution 2017-10-31 11:38:20 +01:00
Rafa de la Torre
ce5d1f9e86 Release 0.5.2 2017-05-12 17:26:41 +02:00
Raul Ochoa
69713ecb0a Merge pull request #172 from CartoDB/fix-global-rate-stat
Fix missing comma for dict creation
2017-03-02 14:39:47 +01:00
Raul Ochoa
d07822c7a0 Fix missing comma for dict creation 2017-02-27 12:07:41 +01:00
Mario de Frutos
154d1a674d Added CLA part in the contributing document 2017-01-25 10:45:30 +01:00
Andy Eschbacher
76f5eae928 quote col names 2017-01-12 16:02:27 -05:00
Andy Eschbacher
c2bfcc8516 adding sql tests 2017-01-11 10:36:46 -05:00
Andy Eschbacher
7725cada13 tests for gwr predict 2017-01-10 15:53:09 -05:00
Andy Eschbacher
0815db4661 move floats to numpy floats 2017-01-09 16:28:00 -05:00
Andy Eschbacher
0f24a4d35a adds framework for testing predict 2017-01-09 16:04:38 -05:00
Andy Eschbacher
6d59061a00 add optional params to predict function 2017-01-09 15:49:32 -05:00
Andy Eschbacher
25f1fc2b28 fix bug on output of predicted values with wrong id 2017-01-09 15:48:43 -05:00
Andy Eschbacher
1bf64e0d4c updates descriptions 2017-01-09 10:09:25 -05:00
Andy Eschbacher
0007082efe finishes python tests for regression 2017-01-09 10:00:44 -05:00
Andy Eschbacher
87890926ad adding back filter t-vals lost in commit squash 2017-01-09 14:12:22 +00:00
Andy Eschbacher
00567e5272 update date, correct col information and order 2017-01-06 13:50:23 -05:00
Andy Eschbacher
e39051e958 renaming for consistency 2017-01-06 12:04:41 -05:00
Andy Eschbacher
c782b812d5 stubs out more testing efforts 2017-01-06 12:03:30 -05:00
Andy Eschbacher
521a79ad7f packed with orig coords and deps in correct order 2017-01-06 12:02:50 -05:00
Andy Eschbacher
8c71820d97 adds descriptions 2017-01-06 10:44:25 -05:00
Andy Eschbacher
cdb81ea896 refactor 2017-01-06 10:37:12 -05:00
Andy Eschbacher
e2cf12aaba predict -> class framework 2017-01-06 10:36:40 -05:00
Andy Eschbacher
92fc25f6b5 placeholders for desciptions fix references 2017-01-05 16:18:08 -05:00
Andy Eschbacher
46b2b80008 adds references to docs 2017-01-05 15:26:16 -05:00
Andy Eschbacher
ba54e9b42d consolidate gwr sql functions 2017-01-05 14:24:10 -05:00
Andy Eschbacher
d95ca54cdc updating prediction to data provider 2017-01-05 14:22:40 -05:00
Andy Eschbacher
fa3eecb233 test updates 2017-01-05 14:21:46 -05:00
Andy Eschbacher
daf4d5984c first pass at tests 2017-01-05 13:02:36 -05:00
Andy Eschbacher
5735831142 adding fixture files 2017-01-05 13:01:56 -05:00
Andy Eschbacher
1be0aa8330 adding docs for gwr 2017-01-04 16:37:09 -05:00
Andy Eschbacher
b049da4155 switch order of rowid to be last in output signature 2017-01-04 16:36:46 -05:00
Andy Eschbacher
dbee19723e clean up code and loose ends on data provider changes 2017-01-04 14:40:37 -05:00
Andy Eschbacher
fd9d08dfbc Merge branch 'pysal_gwr' of github.com:TaylorOshan/crankshaft into pysal_gwr 2017-01-04 12:09:00 -05:00
Andy Eschbacher
4d75d2cf74 Merge branch 'develop' into pysal_gwr 2017-01-04 12:03:28 -05:00
Taylor Oshan
b06b783c67 update GWR output to also provide filtered (corrected) tvals 2016-12-21 15:00:45 -07:00
Taylor Oshan
f511afbbf0 fix bug in filter tvals; doesnt effect any current crankshaft functions 2016-12-21 08:50:38 -07:00
Taylor Oshan
39c2e01827 add crankshaft gwr_prediction infrastructure 2016-12-17 18:25:38 -07:00
Taylor Oshan
8fcb6a6553 Add GWR prediction base files 2016-12-16 07:30:31 -07:00
58 changed files with 7079 additions and 106 deletions

View File

@@ -1,4 +1,6 @@
language: c
dist: precise
sudo: required
env:
global:
@@ -42,9 +44,9 @@ before_install:
- sudo apt-get -y remove --purge postgis-2.2
- sudo apt-get -y autoremove
- sudo apt-get -y install postgresql-9.5=9.5.2-3cdb2
- sudo apt-get -y install postgresql-server-dev-9.5=9.5.2-3cdb2
- sudo apt-get -y install postgresql-plpython-9.5=9.5.2-3cdb2
- sudo apt-get -y install postgresql-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-server-dev-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-plpython-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-9.5-postgis-scripts=2.2.2.0-cdb2
- sudo apt-get -y install postgresql-9.5-postgis-2.2=2.2.2.0-cdb2

View File

@@ -55,3 +55,7 @@ sudo make install
# Run the tests against the installed extension.
make test
```
## Submitting contributions
Before opening a pull request (or submitting a contribution) you will need to sign a Contributor License Agreement (CLA) before making a submission, [learn more here](https://carto.com/contributions).

View File

@@ -4,3 +4,5 @@ PACKAGE = crankshaft
EXTVERSION = $(shell grep default_version $(SELF_DIR)/src/pg/$(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
RELEASE_VERSION ?= $(EXTVERSION)
SED = sed
PIP = pip
NOSETESTS = nosetests

10
NEWS.md
View File

@@ -1,4 +1,12 @@
0.5.0 (2016-12-15)
0.5.2 (2017-05-12)
------------------
* Fixes missing comma for dict creation #172
0.5.1 (2016-12-12)
------------------
* Fixed problem with the upgrade file from 0.4.2 to 0.5.0 that hasn't changes that should be there (as per ethervoid).
0.5.0 (2016-12-12)
------------------
* Updated PULL_REQUEST_TEMPLATE
* Fixed a bug that flips the order of the numerator in denominator for calculating using Moran Local Rate because previously the code sorted the keys alphabetically.

128
doc/21_gwr.md Normal file
View File

@@ -0,0 +1,128 @@
## Regression
### Predictive geographically weighted regression (GWR)
Predictive GWR generates estimates of the dependent variable at locations where it has not been observed. It predicts these unknown values by first using the GWR model estimation analysis with known data values of the dependent and independent variables sampled from around the prediction location(s) to build a geographically weighted, spatially-varying regression model. It then uses this model and known values of the independent variables at the prediction locations to predict the value of the dependent variable where it is otherwise unknown.
For predictive GWR to work, a dataset needs known independent variables, some known dependent variables, and some unknown dependent variables. The dataset also needs to have geometry data (e.g., point, lines, or polygons).
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that expose the data to be analyzed (e.g., `SELECT * FROM regression_inputs`). This query must have the geometry column name (see the optional `geom_col` for default), the id column name (see `id_col`), and the dependent (`dep_var`) and independent (`ind_vars`) column names. |
| dep_var | TEXT | Name of the dependent variable in the regression model |
| ind_vars | TEXT[] | Text array of independent variable column names used in the model to describe the dependent variable. |
| bw (optional) | NUMERIC | Value of bandwidth. If `NULL` then select optimal (default). |
| fixed (optional) | BOOLEAN | True for distance based kernel function and False (default) for adaptive (nearest neighbor) kernel function. Defaults to `False`. |
| kernel (optional)| TEXT | Type of kernel function used to weight observations. One of `gaussian`, `bisquare` (default), or `exponential`. |
#### Returns
| Column Name | Type | Description |
|-------------|------|-------------|
| coeffs | JSON | JSON object with parameter estimates for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the parameter estimate. |
| stand_errs | JSON | Standard errors for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the respective standard errors. |
| t_vals | JSON | T-values for each of the dependent variables. The keys of the JSON object are the dependent variable names, with values corresponding to the respective t-value. |
| predicted | NUMERIC | predicted value of y |
| residuals | NUMERIC | residuals of the response |
| r_squared | NUMERIC | R-squared for the parameter fit |
| bandwidth | NUMERIC | bandwidth value consisting of either a distance or N nearest neighbors |
| rowid | INTEGER | row id of the original row |
#### Example Usage
```sql
SELECT
g.cartodb_id,
g.the_geom,
g.the_geom_webmercator,
(gwr.coeffs->>'pctblack')::numeric as coeff_pctblack,
(gwr.coeffs->>'pctrural')::numeric as coeff_pctrural,
(gwr.coeffs->>'pcteld')::numeric as coeff_pcteld,
(gwr.coeffs->>'pctpov')::numeric as coeff_pctpov,
gwr.residuals
FROM cdb_crankshaft.CDB_GWR_Predict('select * from g_utm'::text,
'pctbach'::text,
Array['pctblack', 'pctrural', 'pcteld', 'pctpov']) As gwr
JOIN g_utm as g
on g.cartodb_id = gwr.rowid
```
Note: See [PostgreSQL syntax for parsing JSON objects](https://www.postgresql.org/docs/9.5/static/functions-json.html).
### Geographically weighted regression model estimation
This analysis generates the model coefficients for a geographically weighted, spatially-varying regression. The model coefficients, along with their respective statistics, allow one to make inferences or describe a dependent variable based on a set of independent variables. Similar to traditional linear regression, GWR takes a linear combination of independent variables and a known dependent variable to estimate an optimal set of coefficients. The model coefficients are spatially varying (controlled by the `bandwidth` and `fixed` parameters), so that the model output is allowed to vary from geometry to geometry. This allows GWR to capture non-stationarity -- that is, how local processes vary over space. In contrast, coefficients obtained from estimating a traditional linear regression model assume that processes are constant over space.
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that expose the data to be analyzed (e.g., `SELECT * FROM regression_inputs`). This query must have the geometry column name (see the optional `geom_col` for default), the id column name (see `id_col`), dependent and independent column names. |
| dep_var | TEXT | name of the dependent variable in the regression model |
| ind_vars | TEXT[] | Text array of independent variables used in the model to describe the dependent variable |
| bw (optional) | NUMERIC | Value of bandwidth. If `NULL` then select optimal (default). |
| fixed (optional) | BOOLEAN | True for distance based kernel function and False for adaptive (nearest neighbor) kernel function (default). Defaults to false. |
| kernel | TEXT | Type of kernel function used to weight observations. One of `gaussian`, `bisquare` (default), or `exponential`. |
#### Returns
| Column Name | Type | Description |
|-------------|------|-------------|
| coeffs | JSON | JSON object with parameter estimates for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the parameter estimate. |
| stand_errs | JSON | Standard errors for each of the dependent variables. The keys of the JSON object are the dependent variables, with values corresponding to the respective standard errors. |
| t_vals | JSON | T-values for each of the dependent variables. The keys of the JSON object are the dependent variable names, with values corresponding to the respective t-value. |
| predicted | NUMERIC | predicted value of y |
| residuals | NUMERIC | residuals of the response |
| r_squared | NUMERIC | R-squared for the parameter fit |
| bandwidth | NUMERIC | bandwidth value consisting of either a distance or N nearest neighbors |
| rowid | INTEGER | row id of the original row |
#### Example Usage
```sql
SELECT
g.cartodb_id,
g.the_geom,
g.the_geom_webmercator,
(gwr.coeffs->>'pctblack')::numeric as coeff_pctblack,
(gwr.coeffs->>'pctrural')::numeric as coeff_pctrural,
(gwr.coeffs->>'pcteld')::numeric as coeff_pcteld,
(gwr.coeffs->>'pctpov')::numeric as coeff_pctpov,
gwr.residuals
FROM cdb_crankshaft.CDB_GWR('select * from g_utm'::text, 'pctbach'::text, Array['pctblack', 'pctrural', 'pcteld', 'pctpov']) As gwr
JOIN g_utm as g
on g.cartodb_id = gwr.rowid
```
Note: See [PostgreSQL syntax for parsing JSON objects](https://www.postgresql.org/docs/9.5/static/functions-json.html).
## Advanced reading
* Fotheringham, A. Stewart, Chris Brunsdon, and Martin Charlton. 2002. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. John Wiley & Sons. <http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471496162.html>
* Brunsdon, Chris, A. Stewart Fotheringham, and Martin E. Charlton. 1996. "Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity." Geographical Analysis 28 (4): 28198. <http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1996.tb00936.x/abstract>
* Brunsdon, Chris, Stewart Fotheringham, and Martin Charlton. 1998. "Geographically Weighted Regression." Journal of the Royal Statistical Society: Series D (The Statistician) 47 (3): 43143. <http://onlinelibrary.wiley.com/doi/10.1111/1467-9884.00145/abstract>
* Fotheringham, A. S., M. E. Charlton, and C. Brunsdon. 1998. "Geographically Weighted Regression: A Natural Evolution of the Expansion Method for Spatial Data Analysis." Environment and Planning A 30 (11): 190527. doi:10.1068/a301905. <https://www.researchgate.net/publication/23538637_Geographically_Weighted_Regression_A_Natural_Evolution_Of_The_Expansion_Method_for_Spatial_Data_Analysis>
### GWR for prediction
* Harris, P., A. S. Fotheringham, R. Crespo, and M. Charlton. 2010. "The Use of Geographically Weighted Regression for Spatial Prediction: An Evaluation of Models Using Simulated Data Sets." Mathematical Geosciences 42 (6): 65780. doi:10.1007/s11004-010-9284-7. <https://www.researchgate.net/publication/225757830_The_Use_of_Geographically_Weighted_Regression_for_Spatial_Prediction_An_Evaluation_of_Models_Using_Simulated_Data_Sets>
### GWR in application
* Cahill, Meagan, and Gordon Mulligan. 2007. "Using Geographically Weighted Regression to Explore Local Crime Patterns." Social Science Computer Review 25 (2): 17493. doi:10.1177/0894439307298925. <http://isites.harvard.edu/fs/docs/icb.topic923297.files/174.pdf>
* Gilbert, Angela, and Jayajit Chakraborty. 2011. "Using Geographically Weighted Regression for Environmental Justice Analysis: Cumulative Cancer Risks from Air Toxics in Florida." Social Science Research 40 (1): 27386. doi:10.1016/j.ssresearch.2010.08.006. <http://scholarcommons.usf.edu/cgi/viewcontent.cgi?article=2985&context=etd>
* Ali, Kamar, Mark D. Partridge, and M. Rose Olfert. 2007. "Can Geographically Weighted Regressions Improve Regional Analysis and Policy Making?" International Regional Science Review 30 (3): 300329. doi:10.1177/0160017607301609. <https://www.researchgate.net/publication/249682503_Can_Geographically_Weighted_Regressions_Improve_Regional_Analysis_and_Policy_Making>
* Lu, Binbin, Martin Charlton, and A. Stewart Fotheringhama. 2011. "Geographically Weighted Regression Using a Non-Euclidean Distance Metric with a Study on London House Price Data." Procedia Environmental Sciences, Spatial Statistics 2011: Mapping Global Change, 7: 9297. doi:10.1016/j.proenv.2011.07.017. <https://www.researchgate.net/publication/261960122_Geographically_weighted_regression_with_a_non-Euclidean_distance_metric_A_case_study_using_hedonic_house_price_data>

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.5.1'
default_version = '0.5.2'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -0,0 +1,6 @@
"""Import all modules"""
import crankshaft.random_seeds
import crankshaft.clustering
import crankshaft.space_time_dynamics
import crankshaft.segmentation
import analysis_data_provider

View File

@@ -0,0 +1,67 @@
"""class for fetching data"""
import plpy
import pysal_utils as pu
class AnalysisDataProvider:
def get_getis(self, w_type, params):
"""fetch data for getis ord's g"""
try:
query = pu.construct_neighbor_query(w_type, params)
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
else:
return result
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)
def get_markov(self, w_type, params):
"""fetch data for spatial markov"""
try:
query = pu.construct_neighbor_query(w_type, params)
data = plpy.execute(query)
if len(data) == 0:
return pu.empty_zipped_array(4)
return data
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)
def get_moran(self, w_type, params):
"""fetch data for moran's i analyses"""
try:
query = pu.construct_neighbor_query(w_type, params)
data = plpy.execute(query)
# if there are no neighbors, exit
if len(data) == 0:
return pu.empty_zipped_array(2)
return data
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % e)
return pu.empty_zipped_array(2)
def get_nonspatial_kmeans(self, query):
"""fetch data for non-spatial kmeans"""
try:
data = plpy.execute(query)
return data
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)
def get_spatial_kmeans(self, params):
"""fetch data for spatial kmeans"""
query = ("SELECT "
"array_agg({id_col} ORDER BY {id_col}) as ids,"
"array_agg(ST_X({geom_col}) ORDER BY {id_col}) As xs,"
"array_agg(ST_Y({geom_col}) ORDER BY {id_col}) As ys "
"FROM ({subquery}) As a "
"WHERE {geom_col} IS NOT NULL").format(**params)
try:
data = plpy.execute(query)
return data
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)

View File

@@ -0,0 +1,4 @@
"""Import all functions from for clustering"""
from moran import *
from kmeans import *
from getis import *

View File

@@ -0,0 +1,50 @@
"""
Getis-Ord's G geostatistics (hotspot/coldspot analysis)
"""
import pysal as ps
from collections import OrderedDict
# crankshaft modules
import crankshaft.pysal_utils as pu
from crankshaft.analysis_data_provider import AnalysisDataProvider
# High level interface ---------------------------------------
class Getis:
def __init__(self, data_provider=None):
if data_provider is None:
self.data_provider = AnalysisDataProvider()
else:
self.data_provider = data_provider
def getis_ord(self, subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Getis-Ord's G*
Implementation building neighbors with a PostGIS database and PySAL's
Getis-Ord's G* hotspot/coldspot module.
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors if kNN is chosen
qvals = OrderedDict([("id_col", id_col),
("attr1", attr),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_getis(w_type, qvals)
attr_vals = pu.get_attributes(result)
# build PySAL weight object
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate Getis-Ord's G* z- and p-values
getis = ps.esda.getisord.G_Local(attr_vals, weight,
star=True, permutations=permutations)
return zip(getis.z_sim, getis.p_sim, getis.p_z_sim, weight.id_order)

View File

@@ -0,0 +1,32 @@
from sklearn.cluster import KMeans
import numpy as np
from crankshaft.analysis_data_provider import AnalysisDataProvider
class Kmeans:
def __init__(self, data_provider=None):
if data_provider is None:
self.data_provider = AnalysisDataProvider()
else:
self.data_provider = data_provider
def spatial(self, query, no_clusters, no_init=20):
"""
find centers based on clusters of latitude/longitude pairs
query: SQL query that has a WGS84 geometry (the_geom)
"""
params = {"subquery": query,
"geom_col": "the_geom",
"id_col": "cartodb_id"}
data = self.data_provider.get_spatial_kmeans(params)
# Unpack query response
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters=no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs, ys))
return zip(ids, labels)

View File

@@ -0,0 +1,208 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
from collections import OrderedDict
from crankshaft.analysis_data_provider import AnalysisDataProvider
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
class Moran:
def __init__(self, data_provider=None):
if data_provider is None:
self.data_provider = AnalysisDataProvider()
else:
self.data_provider = data_provider
def global_stat(self, subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
params = OrderedDict([("id_col", id_col),
("attr1", attr_name),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_moran(w_type, params)
# collect attributes
attr_vals = pu.get_attributes(result)
# calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def local_stat(self, subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
params = OrderedDict([("id_col", id_col),
("attr1", attr),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_moran(w_type, params)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def global_rate_stat(self, subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
params = OrderedDict([("id_col", id_col),
("attr1", numerator),
("attr2", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_moran(w_type, params)
# collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def local_rate_stat(self, subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
params = OrderedDict([("id_col", id_col),
("numerator", numerator),
("denominator", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_moran(w_type, params)
# collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def local_bivariate_stat(self, subquery, attr1, attr2,
permutations, geom_col, id_col,
w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
params = OrderedDict([("id_col", id_col),
("attr1", attr1),
("attr2", attr2),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
result = self.data_provider.get_moran(w_type, params)
# collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
# find clustering of significance
lisa_sig = quad_position(lisa.q)
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1,2 @@
"""Import all functions for pysal_utils"""
from crankshaft.pysal_utils.pysal_utils import *

View File

@@ -0,0 +1,211 @@
"""
Utilities module for generic PySAL functionality, mainly centered on
translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
# Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res dict-like: query results with attributes and neighbors
"""
# if w_type.lower() == 'knn':
# row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
# weights = {x['id']: row_normed_weights for x in query_res}
# else:
# weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
# if len(x['neighbors']) > 0
# else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
print 'len of neighbors: %d' % len(neighbors)
built_weight = ps.W(neighbors)
built_weight.transform = 'r'
return built_weight
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
Defaults to order in the params
@param params: dict of information used in query (column names,
table name, etc.)
Example:
OrderedDict([('numerator', 'price'),
('denominator', 'sq_meters'),
('subquery', 'SELECT * FROM interesting_data')])
Output:
"i.\"price\"::numeric As attr1, " \
"i.\"sq_meters\"::numeric As attr2, "
"""
attr_string = ""
template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
if 'time_cols' in params:
# if markov analysis
attrs = params['time_cols']
for idx, val in enumerate(attrs):
attr_string += template % {"col": val, "alias_num": idx + 1}
else:
# if moran's analysis
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery',
'num_ngbrs', 'subquery')]
for idx, val in enumerate(attrs):
attr_string += template % {"col": params[val],
"alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Construct where conditions when building neighbors query
Create portion of WHERE clauses for weeding out NULL-valued geometries
Input: dict of params:
{'subquery': ...,
'numerator': 'data1',
'denominator': 'data2',
'': ...}
Output:
'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
Input:
{'subquery': ...,
'time_cols': ['time1', 'time2', 'time3'],
'etc': ...}
Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
NULL AND idx_replace."time3" IS NOT NULL'
"""
attr_string = []
template = "idx_replace.\"%s\" IS NOT NULL"
if 'time_cols' in params:
# markov where clauses
attrs = params['time_cols']
# add values to template
for attr in attrs:
attr_string.append(template % attr)
else:
# moran where clauses
# get keys
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery',
'num_ngbrs', 'subquery')]
# add values to template
for attr in attrs:
attr_string.append(template % params[attr])
if 'denominator' in attrs:
attr_string.append(
"idx_replace.\"%s\" <> 0" % params['denominator'])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
# SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
# to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res],
dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,11 @@
"""Random seed generator used for non-deterministic functions in crankshaft"""
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1 @@
from segmentation import *

View File

@@ -0,0 +1,176 @@
"""
Segmentation creation and prediction
"""
import sklearn
import numpy as np
import plpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import metrics
from sklearn.cross_validation import train_test_split
# Lower level functions
#----------------------
def replace_nan_with_mean(array):
"""
Input:
@param array: an array of floats which may have null-valued entries
Output:
array with nans filled in with the mean of the dataset
"""
# returns an array of rows and column indices
indices = np.where(np.isnan(array))
# iterate through entries which have nan values
for row, col in zip(*indices):
array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
return array
def get_data(variable, feature_columns, query):
"""
Fetch data from the database, clean, and package into
numpy arrays
Input:
@param variable: name of the target variable
@param feature_columns: list of column names
@param query: subquery that data is pulled from for the packaging
Output:
prepared data, packaged into NumPy arrays
"""
columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
try:
data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
variable=variable,
columns=columns,
query=query))
except Exception, e:
plpy.error('Failed to access data to build segmentation model: %s' % e)
# extract target data from plpy object
target = np.array(data[0]['target'])
# put n feature data arrays into an n x m array of arrays
features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
return replace_nan_with_mean(target), replace_nan_with_mean(features)
# High level interface
# --------------------
def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
"""
Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
the function.
Input:
@param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
@param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
@param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
@param model_parameters: A dictionary containing parameters for the model.
"""
clean_target = replace_nan_with_mean(target)
clean_features = replace_nan_with_mean(features)
target_features = replace_nan_with_mean(target_features)
model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
prediction = model.predict(target_features)
accuracy_array = [accuracy]*prediction.shape[0]
return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
def create_and_predict_segment(query, variable, target_query, model_params):
"""
generate a segment with machine learning
Stuart Lynn
"""
## fetch column names
try:
columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1 '.format(query=query))[0].keys()
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
## extract column names to be used in building the segmentation model
feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
## get data from database
target, features = get_data(variable, feature_columns, query)
model, accuracy = train_model(target, features, model_params, 0.2)
cartodb_ids, result = predict_segment(model, feature_columns, target_query)
accuracy_array = [accuracy]*result.shape[0]
return zip(cartodb_ids, result, accuracy_array)
def train_model(target, features, model_params, test_split):
"""
Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
Input:
@param target: 1D Array of the variable that the model is to be trianed to predict
@param features: 2D Array NSamples * NFeatures to use in trining the model
@param model_params: A dictionary of model parameters, the full specification can be found on the
scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
@parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
"""
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
model = GradientBoostingRegressor(**model_params)
model.fit(features_train, target_train)
accuracy = calculate_model_accuracy(model, features, target)
return model, accuracy
def calculate_model_accuracy(model, features, target):
"""
Calculate the mean squared error of the model prediction
Input:
@param model: model trained from input features
@param features: features to make a prediction from
@param target: target to compare prediction to
Output:
mean squared error of the model prection compared to the target
"""
prediction = model.predict(features)
return metrics.mean_squared_error(prediction, target)
def predict_segment(model, features, target_query):
"""
Use the provided model to predict the values for the new feature set
Input:
@param model: The pretrained model
@features: A list of features to use in the model prediction (list of column names)
@target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
"""
batch_size = 1000
joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
try:
cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
joined_features=joined_features,
target_query=target_query))
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
results = []
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
#Need to fix this. Should be global mean. This will cause weird effects
batch = replace_nan_with_mean(batch)
prediction = model.predict(batch)
results.append(prediction)
try:
cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
return cartodb_ids, np.concatenate(results)

View File

@@ -0,0 +1,2 @@
"""Import all functions from clustering libraries."""
from markov import *

View File

@@ -0,0 +1,194 @@
"""
Spatial dynamics measurements using Spatial Markov
"""
# TODO: remove all plpy dependencies
import numpy as np
import pysal as ps
import plpy
import crankshaft.pysal_utils as pu
from crankshaft.analysis_data_provider import AnalysisDataProvider
class Markov:
def __init__(self, data_provider=None):
if data_provider is None:
self.data_provider = AnalysisDataProvider()
else:
self.data_provider = data_provider
def spatial_trend(self, subquery, time_cols, num_classes=7,
w_type='knn', num_ngbrs=5, permutations=0,
geom_col='the_geom', id_col='cartodb_id'):
"""
Predict the trends of a unit based on:
1. history of its transitions to different classes (e.g., 1st
quantile -> 2nd quantile)
2. average class of its neighbors
Inputs:
@param subquery string: e.g., SELECT the_geom, cartodb_id,
interesting_time_column FROM table_name
@param time_cols list of strings: list of strings of column names
@param num_classes (optional): number of classes to break
distribution of values into. Currently uses quantile bins.
@param w_type string (optional): weight type ('knn' or 'queen')
@param num_ngbrs int (optional): number of neighbors (if knn type)
@param permutations int (optional): number of permutations for test
stats
@param geom_col string (optional): name of column which contains
the geometries
@param id_col string (optional): name of column which has the ids
of the table
Outputs:
@param trend_up float: probablity that a geom will move to a higher
class
@param trend_down float: probablity that a geom will move to a
lower class
@param trend float: (trend_up - trend_down) / trend_static
@param volatility float: a measure of the volatility based on
probability stddev(prob array)
"""
if len(time_cols) < 2:
plpy.error('More than one time column needs to be passed')
params = {"id_col": id_col,
"time_cols": time_cols,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query_result = self.data_provider.get_markov(w_type, params)
# build weight
weights = pu.get_weight(query_result, w_type)
weights.transform = 'r'
# prep time data
t_data = get_time_data(query_result, time_cols)
sp_markov_result = ps.Spatial_Markov(t_data,
weights,
k=num_classes,
fixed=False,
permutations=permutations)
# get lag classes
lag_classes = ps.Quantiles(
ps.lag_spatial(weights, t_data[:, -1]),
k=num_classes).yb
# look up probablity distribution for each unit according to class and
# lag class
prob_dist = get_prob_dist(sp_markov_result.P,
lag_classes,
sp_markov_result.classes[:, -1])
# find the ups and down and overall distribution of each cell
trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist, sp_markov_result.classes[:, -1])
# output the results
return zip(trend, trend_up, trend_down, volatility, weights.id_order)
def get_time_data(markov_data, time_cols):
"""
Extract the time columns and bin appropriately
"""
num_attrs = len(time_cols)
return np.array([[x['attr' + str(i)] for x in markov_data]
for i in range(1, num_attrs+1)], dtype=float).transpose()
# not currently used
def rebin_data(time_data, num_time_per_bin):
"""
Convert an n x l matrix into an (n/m) x l matrix where the values are
reduced (averaged) for the intervening states:
1 2 3 4 1.5 3.5
5 6 7 8 -> 5.5 7.5
9 8 7 6 8.5 6.5
5 4 3 2 4.5 2.5
if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
This process effectively resamples the data at a longer time span n
units longer than the input data.
For cases when there is a remainder (remainder(5/3) = 2), the remaining
two columns are binned together as the last time period, while the
first three are binned together for the first period.
Input:
@param time_data n x l ndarray: measurements of an attribute at
different time intervals
@param num_time_per_bin int: number of columns to average into a new
column
Output:
ceil(n / m) x l ndarray of resampled time series
"""
if time_data.shape[1] % num_time_per_bin == 0:
# if fit is perfect, then use it
n_max = time_data.shape[1] / num_time_per_bin
else:
# fit remainders into an additional column
n_max = time_data.shape[1] / num_time_per_bin + 1
return np.array(
[time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
for i in range(n_max)]).T
def get_prob_dist(transition_matrix, lag_indices, unit_indices):
"""
Given an array of transition matrices, look up the probability
associated with the arrangements passed
Input:
@param transition_matrix ndarray[k,k,k]:
@param lag_indices ndarray:
@param unit_indices ndarray:
Output:
Array of probability distributions
"""
return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
for i in range(len(lag_indices))])
def get_prob_stats(prob_dist, unit_indices):
"""
get the statistics of the probability distributions
Outputs:
@param trend_up ndarray(float): sum of probabilities for upward
movement (relative to the unit index of that prob)
@param trend_down ndarray(float): sum of probabilities for downward
movement (relative to the unit index of that prob)
@param trend ndarray(float): difference of upward and downward
movements
"""
num_elements = len(unit_indices)
trend_up = np.empty(num_elements, dtype=float)
trend_down = np.empty(num_elements, dtype=float)
trend = np.empty(num_elements, dtype=float)
for i in range(num_elements):
trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
if prob_dist[i, unit_indices[i]] > 0.0:
trend[i] = (trend_up[i] - trend_down[i]) / (
prob_dist[i, unit_indices[i]])
else:
trend[i] = None
# calculate volatility of distribution
volatility = prob_dist.std(axis=1)
return trend_up, trend_down, trend, volatility

View File

@@ -0,0 +1,5 @@
joblib==0.8.3
numpy==1.6.1
scipy==0.14.0
pysal==1.11.2
scikit-learn==0.14.1

View File

@@ -0,0 +1,49 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.5.2',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
# IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[[0.004793783909323601, 0.17999999999999999, 0.49808756424021061], [-1.0701189472090842, 0.079000000000000001, 0.14228288580832316], [-0.67867750971877305, 0.42099999999999999, 0.24867110969448558], [-0.67407386707620487, 0.246, 0.25013217644612995], [-0.79495689068870035, 0.33200000000000002, 0.21331928959090596], [-0.49279481022182703, 0.058999999999999997, 0.31107878905057329], [-0.38075627530057132, 0.28399999999999997, 0.35169205342069643], [-0.86710921611314895, 0.23699999999999999, 0.19294108571294855], [-0.78618647240956485, 0.050000000000000003, 0.2158791250244505], [-0.76108527223116984, 0.064000000000000001, 0.22330306830813684], [-0.13340753531942209, 0.247, 0.44693554317763651], [-0.57584545722033043, 0.48999999999999999, 0.28235982246156488], [-0.78882694661192831, 0.433, 0.2151065788731219], [-0.38769767950046219, 0.375, 0.34911988661484239], [-0.56057819488052207, 0.41399999999999998, 0.28754255985169652], [-0.41354017495644935, 0.45500000000000002, 0.339605447117173], [-0.23993577722243081, 0.49099999999999999, 0.40519002230969337], [-0.1389080156677496, 0.40400000000000003, 0.44476141839645233], [-0.25485737510500855, 0.376, 0.39941662953554224], [-0.71218610582902353, 0.17399999999999999, 0.23817476979886087], [-0.54533105995872144, 0.13700000000000001, 0.2927629228714812], [-0.39547917847510977, 0.033000000000000002, 0.34624464252424236], [-0.43052658996257548, 0.35399999999999998, 0.33340631435564982], [-0.37296719193774736, 0.40300000000000002, 0.35458643102865428], [-0.66482612169465694, 0.31900000000000001, 0.25308085650392698], [-0.13772133540823422, 0.34699999999999998, 0.44523032843016275], [-0.6765304487868502, 0.20999999999999999, 0.24935196033890672], [-0.64518763494323472, 0.32200000000000001, 0.25940279912025543], [-0.5078622084312413, 0.41099999999999998, 0.30577498972600159], [-0.12652006733772059, 0.42899999999999999, 0.44966013262301163], [-0.32691133022814595, 0.498, 0.37186747562269029], [0.25533848511500978, 0.42399999999999999, 0.39923083899077472], [2.7045138116476508, 0.0050000000000000001, 0.0034202212972238577], [-0.1551614486076057, 0.44400000000000001, 0.43834701985429037], [1.9524487722567723, 0.012999999999999999, 0.025442473674991528], [-1.2055816465306763, 0.017000000000000001, 0.11398941970467646], [3.478472976017831, 0.002, 0.00025213964072468009], [-1.4621715757903719, 0.002, 0.071847099325659136], [-0.84010307600180256, 0.085000000000000006, 0.20042529779230778], [5.7097646237318243, 0.0030000000000000001, 5.6566262784940591e-09], [1.5082367956567375, 0.065000000000000002, 0.065746966514827365], [-0.58337270103430816, 0.44, 0.27982121546450034], [-0.083271860457022437, 0.45100000000000001, 0.46681768733385554], [-0.46872337815000953, 0.34599999999999997, 0.31963368715684204], [0.18490279849545319, 0.23799999999999999, 0.42665263797981101], [3.470424529947997, 0.012, 0.00025981817437825683], [-0.99942612137154796, 0.032000000000000001, 0.15879415560388499], [-1.3650387953594485, 0.034000000000000002, 0.08612042845912049], [1.8617160516432014, 0.081000000000000003, 0.03132156240215267], [1.1321188945775384, 0.11600000000000001, 0.12879222611766061], [0.064116686050580601, 0.27300000000000002, 0.4744386578180424], [-0.42032194540259099, 0.29999999999999999, 0.33712514016213468], [-0.79581215423980922, 0.123, 0.21307061309098785], [-0.42792753720906046, 0.45600000000000002, 0.33435193892883741], [-1.0629378527428395, 0.051999999999999998, 0.14390506780140866], [-0.54164761752225477, 0.33700000000000002, 0.29403064095211839], [1.0934778886820793, 0.13700000000000001, 0.13709201601893539], [-0.094068785378413719, 0.38200000000000001, 0.46252725802998929], [0.13482026574801856, 0.36799999999999999, 0.44637699118865737], [-0.13976995315653129, 0.34699999999999998, 0.44442087706276601], [-0.051047663924746682, 0.32000000000000001, 0.47964376985626245], [-0.21468297736730158, 0.41699999999999998, 0.41500724761906527], [-0.20873154637330626, 0.38800000000000001, 0.41732890604390893], [-0.32427876152583485, 0.49199999999999999, 0.37286349875557478], [-0.65254842943280977, 0.374, 0.25702372075306734], [-0.48611858196118796, 0.23300000000000001, 0.31344154643990074], [-0.14482354344529477, 0.32600000000000001, 0.44242509660469886], [-0.51052030974200002, 0.439, 0.30484349480873729], [0.56814382285283538, 0.14999999999999999, 0.28496865660103166], [0.58680919931668207, 0.161, 0.27866592887231878], [0.013390357044409013, 0.25800000000000001, 0.49465818005865647], [-0.19050728887961568, 0.41399999999999998, 0.4244558160399462], [-0.60531777422216049, 0.35199999999999998, 0.2724839368239631], [1.0899331115425805, 0.127, 0.13787130480311838], [0.17015055382651084, 0.36899999999999999, 0.43244586845546418], [-0.21738337124409801, 0.40600000000000003, 0.41395479459421991], [1.0329303331079593, 0.079000000000000001, 0.15081825117169467], [1.0218317101096221, 0.104, 0.15343027913308094]]

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1 @@
[[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,54 @@
import re
class MockCursor:
def __init__(self, data):
self.cursor_pos = 0
self.data = data
def fetch(self, batch_size):
batch = self.data[self.cursor_pos:self.cursor_pos + batch_size]
self.cursor_pos += batch_size
return batch
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def debug(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def cursor(self, query):
data = self.execute(query)
return MockCursor(data)
# TODO: additional arguments
def execute(self, query):
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,78 @@
import unittest
import numpy as np
from helper import fixture_file
from crankshaft.clustering import Getis
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
from crankshaft.analysis_data_provider import AnalysisDataProvider
# Fixture files produced as follows
#
# import pysal as ps
# import numpy as np
# import random
#
# # setup variables
# f = ps.open(ps.examples.get_path("stl_hom.dbf"))
# y = np.array(f.by_col['HR8893'])
# w_queen = ps.queen_from_shapefile(ps.examples.get_path("stl_hom.shp"))
#
# out_queen = [{"id": index + 1,
# "neighbors": [x+1 for x in w_queen.neighbors[index]],
# "value": val} for index, val in enumerate(y)]
#
# with open('neighbors_queen_getis.json', 'w') as f:
# f.write(str(out_queen))
#
# random.seed(1234)
# np.random.seed(1234)
# lgstar_queen = ps.esda.getisord.G_Local(y, w_queen, star=True,
# permutations=999)
#
# with open('getis_queen.json', 'w') as f:
# f.write(str(zip(lgstar_queen.z_sim,
# lgstar_queen.p_sim, lgstar_queen.p_z_sim)))
class FakeDataProvider(AnalysisDataProvider):
def __init__(self, mock_data):
self.mock_result = mock_data
def get_getis(self, w_type, param):
return self.mock_result
class GetisTest(unittest.TestCase):
"""Testing class for Getis-Ord's G* funtion
This test replicates the work done in PySAL documentation:
https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/autocorrelation.html#local-g-and-g
"""
def setUp(self):
# load raw data for analysis
self.neighbors_data = json.loads(
open(fixture_file('neighbors_getis.json')).read())
# load pre-computed/known values
self.getis_data = json.loads(
open(fixture_file('getis.json')).read())
def test_getis_ord(self):
"""Test Getis-Ord's G*"""
data = [{'id': d['id'],
'attr1': d['value'],
'neighbors': d['neighbors']} for d in self.neighbors_data]
random_seeds.set_random_seeds(1234)
getis = Getis(FakeDataProvider(data))
result = getis.getis_ord('subquery', 'value',
'queen', None, 999, 'the_geom',
'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = np.array(self.getis_data)[:, 0:2]
for ([res_z, res_p], [exp_z, exp_p]) in zip(result, expected):
self.assertAlmostEqual(res_z, exp_z, delta=1e-2)

View File

@@ -0,0 +1,56 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import fixture_file
from crankshaft.clustering import Kmeans
from crankshaft.analysis_data_provider import AnalysisDataProvider
import crankshaft.clustering as cc
from crankshaft import random_seeds
import json
from collections import OrderedDict
class FakeDataProvider(AnalysisDataProvider):
def __init__(self, mocked_result):
self.mocked_result = mocked_result
def get_spatial_kmeans(self, query):
return self.mocked_result
def get_nonspatial_kmeans(self, query, standarize):
return self.mocked_result
class KMeansTest(unittest.TestCase):
"""Testing class for k-means spatial"""
def setUp(self):
self.cluster_data = json.loads(
open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"}
def test_kmeans(self):
"""
"""
data = [{'xs': d['xs'],
'ys': d['ys'],
'ids': d['ids']} for d in self.cluster_data]
random_seeds.set_random_seeds(1234)
kmeans = Kmeans(FakeDataProvider(data))
clusters = kmeans.spatial('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1] == 0]
c2 = [a for a in clusters if a[1] == 1]
self.assertEqual(len(np.unique(labels)), 2)
self.assertEqual(len(c1), 20)
self.assertEqual(len(c2), 20)

View File

@@ -0,0 +1,112 @@
import unittest
import numpy as np
from helper import fixture_file
from crankshaft.clustering import Moran
from crankshaft.analysis_data_provider import AnalysisDataProvider
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
from collections import OrderedDict
class FakeDataProvider(AnalysisDataProvider):
def __init__(self, mock_data):
self.mock_result = mock_data
def get_moran(self, w_type, params):
return self.mock_result
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_markov = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan",
"_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(
open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(
open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
from crankshaft.clustering import map_quads
self.assertEqual(map_quads(1), 'HH')
self.assertEqual(map_quads(2), 'LH')
self.assertEqual(map_quads(3), 'LL')
self.assertEqual(map_quads(4), 'HL')
self.assertEqual(map_quads(33), None)
self.assertEqual(map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
from crankshaft.clustering import quad_position
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_local_stat(self):
"""Test Moran's I local"""
data = [OrderedDict([('id', d['id']),
('attr1', d['value']),
('neighbors', d['neighbors'])])
for d in self.neighbors_data]
moran = Moran(FakeDataProvider(data))
random_seeds.set_random_seeds(1234)
result = moran.local_stat('subquery', 'value',
'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
zipped_values = zip(result, self.moran_data)
for ([res_val, res_quad], [exp_val, exp_quad]) in zipped_values:
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [{'id': d['id'],
'attr1': d['value'],
'attr2': 1,
'neighbors': d['neighbors']} for d in self.neighbors_data]
random_seeds.set_random_seeds(1234)
moran = Moran(FakeDataProvider(data))
result = moran.local_rate_stat('subquery', 'numerator', 'denominator',
'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
zipped_values = zip(result, self.moran_data)
for ([res_val, res_quad], [exp_val, exp_quad]) in zipped_values:
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{'id': d['id'],
'attr1': d['value'],
'neighbors': d['neighbors']} for d in self.neighbors_data]
random_seeds.set_random_seeds(1235)
moran = Moran(FakeDataProvider(data))
result = moran.global_stat('table', 'value',
'knn', 5, 99, 'the_geom',
'cartodb_id')
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,160 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
from collections import OrderedDict
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params1 = OrderedDict([("id_col", "cartodb_id"),
("attr1", "andy"),
("attr2", "jay_z"),
("subquery", "SELECT * FROM a_list"),
("geom_col", "the_geom"),
("num_ngbrs", 321)])
self.params2 = OrderedDict([("id_col", "cartodb_id"),
("numerator", "price"),
("denominator", "sq_meters"),
("subquery", "SELECT * FROM pecan"),
("geom_col", "the_geom"),
("num_ngbrs", 321)])
self.params3 = OrderedDict([("id_col", "cartodb_id"),
("numerator", "sq_meters"),
("denominator", "price"),
("subquery", "SELECT * FROM pecan"),
("geom_col", "the_geom"),
("num_ngbrs", 321)])
self.params_array = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans1 = ("i.\"andy\"::numeric As attr1, "
"i.\"jay_z\"::numeric As attr2, ")
ans2 = ("i.\"price\"::numeric As attr1, "
"i.\"sq_meters\"::numeric As attr2, ")
ans3 = ("i.\"sq_meters\"::numeric As attr1, "
"i.\"price\"::numeric As attr2, ")
ans_array = ("i.\"_2013_dec\"::numeric As attr1, "
"i.\"_2014_jan\"::numeric As attr2, "
"i.\"_2014_feb\"::numeric As attr3, ")
self.assertEqual(pu.query_attr_select(self.params1), ans1)
self.assertEqual(pu.query_attr_select(self.params2), ans2)
self.assertEqual(pu.query_attr_select(self.params3), ans3)
self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans1 = ("idx_replace.\"andy\" IS NOT NULL AND "
"idx_replace.\"jay_z\" IS NOT NULL")
ans_array = ("idx_replace.\"_2013_dec\" IS NOT NULL AND "
"idx_replace.\"_2014_jan\" IS NOT NULL AND "
"idx_replace.\"_2014_feb\" IS NOT NULL")
self.assertEqual(pu.query_attr_where(self.params1), ans1)
self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
def test_knn(self):
"""Test knn neighbors constructor"""
ans1 = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL " \
"ORDER BY i.\"cartodb_id\" ASC;"
ans_array = "SELECT i.\"cartodb_id\" As id, " \
"i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"_2013_dec\" IS NOT NULL AND " \
"j.\"_2014_jan\" IS NOT NULL AND " \
"j.\"_2014_feb\" IS NOT NULL " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"_2013_dec\" IS NOT NULL AND " \
"i.\"_2014_jan\" IS NOT NULL AND " \
"i.\"_2014_feb\" IS NOT NULL "\
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params1), ans1)
self.assertEqual(pu.knn(self.params_array), ans_array)
def test_queen(self):
"""Test queen neighbors constructor"""
ans1 = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params1), ans1)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params1),
pu.knn(self.params1))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -0,0 +1,64 @@
import unittest
import numpy as np
from helper import plpy, fixture_file
import crankshaft.segmentation as segmentation
import json
class SegmentationTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
def generate_random_data(self,n_samples,random_state, row_type=False):
x1 = random_state.uniform(size=n_samples)
x2 = random_state.uniform(size=n_samples)
x3 = random_state.randint(0, 4, size=n_samples)
y = x1+x2*x2+x3
cartodb_id = range(len(x1))
if row_type:
return [ {'features': vals} for vals in zip(x1,x2,x3)], y
else:
return [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
def test_replace_nan_with_mean(self):
test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
def test_create_and_predict_segment(self):
n_samples = 1000
random_state_train = np.random.RandomState(13)
random_state_test = np.random.RandomState(134)
training_data = self.generate_random_data(n_samples, random_state_train)
test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
ids = [{'cartodb_ids': range(len(test_data))}]
rows = [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
plpy._define_result('select \* from \(select \* from training\) a limit 1',rows)
plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
plpy._define_result('.*select \* from test.*' ,test_data)
model_parameters = {'n_estimators': 1200,
'max_depth': 3,
'subsample' : 0.5,
'learning_rate': 0.01,
'min_samples_leaf': 1}
result = segmentation.create_and_predict_segment(
'select * from training',
'target',
'select * from test',
model_parameters)
prediction = [r[1] for r in result]
accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
self.assertEqual(len(result),len(test_data))
self.assertTrue( result[0][2] < 0.01)
self.assertTrue( accuracy < 0.5*np.mean(test_y) )

View File

@@ -0,0 +1,349 @@
import unittest
import numpy as np
import unittest
from helper import fixture_file
from crankshaft.space_time_dynamics import Markov
import crankshaft.space_time_dynamics as std
from crankshaft import random_seeds
from crankshaft.analysis_data_provider import AnalysisDataProvider
import json
class FakeDataProvider(AnalysisDataProvider):
def __init__(self, data):
self.mock_result = data
def get_markov(self, w_type, params):
return self.mock_result
class SpaceTimeTests(unittest.TestCase):
"""Testing class for Markov Functions."""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(
open(fixture_file('neighbors_markov.json')).read())
self.markov_data = json.loads(open(fixture_file('markov.json')).read())
self.time_data = np.array([i * np.ones(10, dtype=float)
for i in range(10)]).T
self.transition_matrix = np.array([
[[0.96341463, 0.0304878, 0.00609756, 0., 0.],
[0.06040268, 0.83221477, 0.10738255, 0., 0.],
[0., 0.14, 0.74, 0.12, 0.],
[0., 0.03571429, 0.32142857, 0.57142857, 0.07142857],
[0., 0., 0., 0.16666667, 0.83333333]],
[[0.79831933, 0.16806723, 0.03361345, 0., 0.],
[0.0754717, 0.88207547, 0.04245283, 0., 0.],
[0.00537634, 0.06989247, 0.8655914, 0.05913978, 0.],
[0., 0., 0.06372549, 0.90196078, 0.03431373],
[0., 0., 0., 0.19444444, 0.80555556]],
[[0.84693878, 0.15306122, 0., 0., 0.],
[0.08133971, 0.78947368, 0.1291866, 0., 0.],
[0.00518135, 0.0984456, 0.79274611, 0.0984456, 0.00518135],
[0., 0., 0.09411765, 0.87058824, 0.03529412],
[0., 0., 0., 0.10204082, 0.89795918]],
[[0.8852459, 0.09836066, 0., 0.01639344, 0.],
[0.03875969, 0.81395349, 0.13953488, 0., 0.00775194],
[0.0049505, 0.09405941, 0.77722772, 0.11881188, 0.0049505],
[0., 0.02339181, 0.12865497, 0.75438596, 0.09356725],
[0., 0., 0., 0.09661836, 0.90338164]],
[[0.33333333, 0.66666667, 0., 0., 0.],
[0.0483871, 0.77419355, 0.16129032, 0.01612903, 0.],
[0.01149425, 0.16091954, 0.74712644, 0.08045977, 0.],
[0., 0.01036269, 0.06217617, 0.89637306, 0.03108808],
[0., 0., 0., 0.02352941, 0.97647059]]]
)
def test_spatial_markov(self):
"""Test Spatial Markov."""
data = [{'id': d['id'],
'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'],
'neighbors': d['neighbors']} for d in self.neighbors_data]
# print(str(data[0]))
markov = Markov(FakeDataProvider(data))
random_seeds.set_random_seeds(1234)
result = markov.spatial_trend('subquery',
['y1995', 'y1996', 'y1997', 'y1998',
'y1999', 'y2000', 'y2001', 'y2002',
'y2003', 'y2004', 'y2005', 'y2006',
'y2007', 'y2008', 'y2009'],
5, 'knn', 5, 0, 'the_geom',
'cartodb_id')
self.assertTrue(result is not None)
result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
print result[0]
expected = self.markov_data
for ([res_trend, res_up, res_down, res_vol, res_id],
[exp_trend, exp_up, exp_down, exp_vol, exp_id]
) in zip(result, expected):
self.assertAlmostEqual(res_trend, exp_trend)
def test_get_time_data(self):
"""Test get_time_data"""
data = [{'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009']} for d in self.neighbors_data]
result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998',
'y1999', 'y2000', 'y2001', 'y2002',
'y2003', 'y2004', 'y2005', 'y2006',
'y2007', 'y2008', 'y2009'])
# expected was prepared from PySAL example:
# f = ps.open(ps.examples.get_path("usjoin.csv"))
# pci = np.array([f.by_col[str(y)]
# for y in range(1995, 2010)]).transpose()
# rpci = pci / (pci.mean(axis = 0))
expected = np.array(
[[0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154,
0.83271652, 0.83786314, 0.85012593, 0.85509656, 0.86416612,
0.87119375, 0.86302631, 0.86148267, 0.86252252, 0.86746356],
[0.9188951, 0.91757931, 0.92333258, 0.92517289, 0.92552388,
0.90746978, 0.89830489, 0.89431991, 0.88924794, 0.89815176,
0.91832091, 0.91706054, 0.90139505, 0.87897455, 0.86216858],
[0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522,
0.78964559, 0.80584442, 0.8084998, 0.82258551, 0.82668196,
0.82373724, 0.81814804, 0.83675961, 0.83574199, 0.84647177],
[1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841,
1.14506948, 1.12151133, 1.11160697, 1.10888621, 1.11399806,
1.12168029, 1.13164797, 1.12958508, 1.11371818, 1.09936775],
[1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025,
1.16898201, 1.17212488, 1.14752303, 1.11843284, 1.11024964,
1.11943471, 1.11736468, 1.10863242, 1.09642516, 1.07762337],
[1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684,
1.44184737, 1.44782832, 1.41978227, 1.39092208, 1.4059372,
1.40788646, 1.44052766, 1.45241216, 1.43306098, 1.4174431],
[1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149,
1.10888138, 1.11856629, 1.13062931, 1.11944984, 1.12446239,
1.11671008, 1.10880034, 1.08401709, 1.06959206, 1.07875225],
[1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545,
0.99854316, 0.9880258, 0.99669587, 0.99327676, 1.01400905,
1.03176742, 1.040511, 1.01749645, 0.9936394, 0.98279746],
[0.98996986, 1.00143564, 0.99491, 1.00188408, 1.00455845,
0.99127006, 0.97925917, 0.9683482, 0.95335147, 0.93694787,
0.94308213, 0.92232874, 0.91284091, 0.89689833, 0.88928858],
[0.87418391, 0.86416601, 0.84425695, 0.8404494, 0.83903044,
0.8578708, 0.86036185, 0.86107306, 0.8500772, 0.86981998,
0.86837929, 0.87204141, 0.86633032, 0.84946077, 0.83287146],
[1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624,
1.14450183, 1.12349752, 1.12596664, 1.12213996, 1.1119989,
1.10257792, 1.10491258, 1.11059842, 1.10509795, 1.10020097],
[0.97282463, 0.96700147, 0.96252588, 0.9653878, 0.96057687,
0.95831051, 0.94480909, 0.94804195, 0.95430286, 0.94103989,
0.92122519, 0.91010201, 0.89280392, 0.89298243, 0.89165385],
[0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647,
0.9480927, 0.93539182, 0.95388718, 0.94597005, 0.96918424,
0.94781281, 0.93466815, 0.94281559, 0.96520315, 0.96715441],
[0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897,
0.98687073, 0.99237486, 0.98209969, 0.9877653, 0.97399471,
0.96910087, 0.98416665, 0.98423613, 0.99823861, 0.99545704],
[0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012,
0.86191535, 0.84981451, 0.85472102, 0.84564835, 0.83998883,
0.83478547, 0.82803648, 0.8198736, 0.82265395, 0.8399404],
[0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136,
0.82785597, 0.86008789, 0.86776298, 0.86720209, 0.8676334,
0.89179317, 0.94202108, 0.9422231, 0.93902708, 0.94479184],
[0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238,
0.90906632, 0.92693339, 0.93695966, 0.94242697, 0.94338265,
0.91981796, 0.91108804, 0.90543476, 0.91737138, 0.94793657],
[1.1977611, 1.18222564, 1.18439158, 1.18267865, 1.19286723,
1.20172869, 1.21328691, 1.22624778, 1.22397075, 1.23857042,
1.24419893, 1.23929384, 1.23418676, 1.23626739, 1.26754398],
[1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667,
1.34790023, 1.34399863, 1.32575181, 1.30795492, 1.30544841,
1.30303302, 1.32107766, 1.32936244, 1.33001241, 1.33288462],
[1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093,
1.05059016, 1.03405057, 1.02747623, 1.03162734, 0.9961416,
0.97356208, 0.94241549, 0.92754547, 0.92549227, 0.92138102],
[1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264,
1.13889622, 1.12442212, 1.13367018, 1.13982256, 1.14029944,
1.11979401, 1.10905389, 1.10577769, 1.11166825, 1.09985155],
[0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284,
0.74480073, 0.76098396, 0.76156903, 0.76651952, 0.76533288,
0.78205934, 0.76842416, 0.77487118, 0.77768683, 0.78801192],
[0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803,
0.97370819, 0.96419154, 0.97209861, 0.97441313, 0.96356162,
0.94745352, 0.93965462, 0.93069645, 0.94020973, 0.94358232],
[0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801,
0.80071489, 0.83358256, 0.83451613, 0.85175032, 0.85954307,
0.86790024, 0.87170334, 0.87863799, 0.87497981, 0.87888675],
[0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619,
0.98733195, 0.99644997, 0.99669587, 1.02559097, 1.01116651,
0.99988024, 0.97906749, 0.99323123, 1.00204939, 0.99602148],
[1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683,
1.08312397, 1.05192626, 1.04230892, 1.05577278, 1.08569751,
1.12443486, 1.08891079, 1.08603695, 1.05997314, 1.02160943],
[1.11368269, 1.1057147, 1.11893431, 1.13778669, 1.1432272,
1.18257029, 1.16226243, 1.16009196, 1.14467789, 1.14820235,
1.12386598, 1.12680236, 1.12357937, 1.1159258, 1.12570828],
[1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667,
1.31210239, 1.29989156, 1.29203193, 1.27183516, 1.26830786,
1.2617743, 1.28656675, 1.29734097, 1.29390205, 1.29345446],
[0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864,
0.78772975, 0.82848011, 0.8259679, 0.82435705, 0.83108634,
0.84373784, 0.83891093, 0.84349247, 0.85637272, 0.86539395],
[1.23450087, 1.2426022, 1.23537935, 1.23581293, 1.24522626,
1.2256767, 1.21126648, 1.19377804, 1.18355337, 1.19674434,
1.21536573, 1.23653297, 1.27962009, 1.27968392, 1.25907738],
[0.9769662, 0.97400719, 0.98035944, 0.97581531, 0.95543282,
0.96480308, 0.94686376, 0.93679073, 0.92540049, 0.92988835,
0.93442917, 0.92100464, 0.91475304, 0.90249622, 0.9021363],
[0.84986886, 0.8986851, 0.84295997, 0.87280534, 0.85659368,
0.88937573, 0.894401, 0.90448993, 0.95495898, 0.92698333,
0.94745352, 0.92562488, 0.96635366, 1.02520312, 1.0394296],
[1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073,
1.00759019, 0.99192968, 0.99747298, 0.99550759, 0.97583768,
0.9610168, 0.94779638, 0.93759089, 0.93353431, 0.94121705],
[0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613,
0.83434854, 0.85813595, 0.84667961, 0.84374558, 0.85951183,
0.87194227, 0.89455097, 0.88283929, 0.90349491, 0.90600675],
[1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086,
1.00581626, 0.98850522, 0.99291168, 0.98983209, 0.97511924,
0.96134615, 0.96382634, 0.95011401, 0.9434686, 0.94637765],
[1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857,
1.04800023, 1.03024941, 1.04200483, 1.0402554, 1.03296979,
1.02191682, 1.02476275, 1.02347523, 1.02517684, 1.04359571],
[1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043,
1.0531801, 1.07452771, 1.09383478, 1.1052447, 1.10322136,
1.09167939, 1.08772756, 1.08859544, 1.09177338, 1.1096083],
[0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809,
0.86287327, 0.85169796, 0.85411285, 0.84886336, 0.84517414,
0.84843858, 0.84488343, 0.83374329, 0.82812044, 0.82878599],
[0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286,
0.92652175, 0.94278865, 0.93682452, 0.98655146, 0.992237,
0.9798497, 0.93869677, 0.96947771, 1.00362626, 0.98102351],
[0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967,
0.93092109, 0.92662519, 0.93412152, 0.93501274, 0.92879506,
0.92110542, 0.91035556, 0.90430364, 0.89994694, 0.90073864],
[0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824,
0.98882205, 0.97662234, 0.95601578, 0.94905385, 0.94934888,
0.97152609, 0.97163004, 0.9700702, 0.97158948, 0.95884908],
[0.83980439, 0.84726737, 0.85747, 0.85467221, 0.8556751,
0.84818516, 0.85265681, 0.84502402, 0.82645665, 0.81743586,
0.83550406, 0.83338919, 0.83511679, 0.82136617, 0.80921874],
[0.95118156, 0.9466212, 0.94688098, 0.9508583, 0.9512441,
0.95440787, 0.96364363, 0.96804412, 0.97136214, 0.97583768,
0.95571724, 0.96895368, 0.97001634, 0.97082733, 0.98782366],
[1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249,
1.10558188, 1.1214086, 1.12292577, 1.13021031, 1.13342735,
1.14686068, 1.14502975, 1.14474747, 1.14084037, 1.16142926],
[1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863,
1.11856702, 1.09764283, 1.08815849, 1.08044313, 1.09278827,
1.07003204, 1.08398066, 1.09831768, 1.09298232, 1.09176125],
[0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744,
0.77751194, 0.79902974, 0.81437881, 0.80788828, 0.79603865,
0.78966436, 0.79949807, 0.80172182, 0.82168155, 0.85587911],
[1.0052447, 1.00007696, 1.00475899, 1.00613942, 1.00639561,
1.00162979, 0.99860739, 1.00814981, 1.00574316, 0.99030032,
0.97682565, 0.97292596, 0.96519561, 0.96173403, 0.95890284],
[0.95808419, 0.9382568, 0.9654441, 0.95561201, 0.96987289,
0.96608031, 0.99727185, 1.00781194, 1.03484236, 1.05333619,
1.0983263, 1.1704974, 1.17025154, 1.18730553, 1.14242645]])
self.assertTrue(np.allclose(result, expected))
self.assertTrue(type(result) == type(expected))
self.assertTrue(result.shape == expected.shape)
def test_rebin_data(self):
"""Test rebin_data"""
# sample in double the time (even case since 10 % 2 = 0):
# (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
# = 0.5, 2.5, 4.5, 6.5, 8.5
ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
for i in range(0, 10, 2)]).T
self.assertTrue(
np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
# sample in triple the time (uneven since 10 % 3 = 1):
# (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
# = 1, 4, 7, 9
ans_odd = np.array([i * np.ones(10, dtype=float)
for i in (1, 4, 7, 9)]).T
self.assertTrue(
np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
def test_get_prob_dist(self):
"""Test get_prob_dist"""
lag_indices = np.array([1, 2, 3, 4])
unit_indices = np.array([1, 3, 2, 4])
answer = np.array([
[0.0754717, 0.88207547, 0.04245283, 0., 0.],
[0., 0., 0.09411765, 0.87058824, 0.03529412],
[0.0049505, 0.09405941, 0.77722772, 0.11881188, 0.0049505],
[0., 0., 0., 0.02352941, 0.97647059]
])
result = std.get_prob_dist(self.transition_matrix,
lag_indices, unit_indices)
self.assertTrue(np.array_equal(result, answer))
def test_get_prob_stats(self):
"""Test get_prob_stats"""
probs = np.array([
[0.0754717, 0.88207547, 0.04245283, 0., 0.],
[0., 0., 0.09411765, 0.87058824, 0.03529412],
[0.0049505, 0.09405941, 0.77722772, 0.11881188, 0.0049505],
[0., 0., 0., 0.02352941, 0.97647059]
])
unit_indices = np.array([1, 3, 2, 4])
answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
answer_trend = np.array([-0.03301887 / 0.88207547,
-0.05882353 / 0.87058824,
0.02475248 / 0.77722772,
-0.02352941 / 0.97647059])
answer_volatility = np.array([0.34221495, 0.33705421,
0.29226542, 0.38834223])
result = std.get_prob_stats(probs, unit_indices)
result_up = result[0]
result_down = result[1]
result_trend = result[2]
result_volatility = result[3]
self.assertTrue(np.allclose(result_up, answer_up))
self.assertTrue(np.allclose(result_down, answer_down))
self.assertTrue(np.allclose(result_trend, answer_trend))
self.assertTrue(np.allclose(result_volatility, answer_volatility))

View File

@@ -25,7 +25,7 @@ $(DATA): $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > $@
TEST_DIR = test
REGRESS = $(notdir $(basename $(wildcard $(TEST_DIR)/sql/*test.sql)))
REGRESS = $(sort $(notdir $(basename $(wildcard $(TEST_DIR)/sql/*test.sql))))
REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
PG_CONFIG = pg_config

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.5.1'
default_version = '0.5.2'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -1,11 +1,36 @@
CREATE OR REPLACE FUNCTION
CDB_GWR(subquery text, dep_var text, ind_vars text[],
bw numeric default null, fixed boolean default False, kernel text default 'bisquare')
RETURNS table(coeffs JSON, stand_errs JSON, t_vals JSON, predicted numeric, residuals numeric, r_squared numeric, rowid bigint, bandwidth numeric)
bw numeric default null, fixed boolean default False,
kernel text default 'bisquare', geom_col text default 'the_geom',
id_col text default 'cartodb_id')
RETURNS table(coeffs JSON, stand_errs JSON, t_vals JSON,
filtered_t_vals JSON, predicted numeric,
residuals numeric, r_squared numeric, bandwidth numeric,
rowid bigint)
AS $$
from crankshaft.regression import gwr_cs
from crankshaft.regression import GWR
return gwr_cs.gwr(subquery, dep_var, ind_vars, bw, fixed, kernel)
gwr = GWR()
return gwr.gwr(subquery, dep_var, ind_vars, bw, fixed, kernel, geom_col, id_col)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
CDB_GWR_Predict(subquery text, dep_var text, ind_vars text[],
bw numeric default null, fixed boolean default False,
kernel text default 'bisquare',
geom_col text default 'the_geom',
id_col text default 'cartodb_id')
RETURNS table(coeffs JSON, stand_errs JSON, t_vals JSON,
r_squared numeric, predicted numeric, rowid bigint)
AS $$
from crankshaft.regression import GWR
gwr = GWR()
return gwr.gwr_predict(subquery, dep_var, ind_vars, bw, fixed, kernel, geom_col, id_col)
$$ LANGUAGE plpythonu;

View File

@@ -1,6 +1,6 @@
-- Install dependencies
CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis VERSION '2.2.2';
CREATE EXTENSION postgis;
-- Create role publicuser if it does not exist
DO
$$

View File

@@ -0,0 +1,12 @@
-- test of Geographically Weighted Regression (GWR)
SET client_min_messages TO WARNING;
\set ECHO none
rowid|coeff_pctrural|std_errs_pctrural|t_vals_pctrural|predicted|residuals|r_squared|bandwidth
13001|-0.0852|0.0220|-3.8678|8.8071|-0.6071|0.5218|90.0
13027|-0.0719|0.0221|-3.2506|9.9673|-0.8673|0.5443|90.0
13027|-0.0719|0.0221|-3.2506|9.9673|-0.8673|0.5443|90.0
13039|-0.0959|0.0241|-3.9755|13.4802|0.0198|0.6269|90.0
13231|-0.1383|0.0181|-7.6634|8.5520|0.7480|0.6337|90.0
13293|-0.1207|0.0184|-6.5553|12.9930|-3.9930|0.6446|90.0
13321|-0.0720|0.0204|-3.5337|8.2738|-1.9738|0.5573|90.0
(7 rows)

199
src/pg/test/fixtures/gwr_georgia.sql vendored Normal file
View File

@@ -0,0 +1,199 @@
SET client_min_messages TO WARNING;
\set ECHO none
--
-- PostgreSQL database dump
--
-- Data from:
-- https://github.com/TaylorOshan/pysal/blob/1d6af33bda46b1d623f70912c56155064463383f/pysal/examples/georgia/GData_utm.csv
CREATE TABLE g_utm_testing (
cartodb_id bigint,
the_geom geometry(Geometry, 2239),
pctblack numeric,
pctpov numeric,
pctbach numeric,
pctrural numeric,
x numeric,
y numeric,
areakey int
);
COPY g_utm_testing (cartodb_id, the_geom, pctblack, pctpov, pctbach, pctrural, x, y, areakey) FROM stdin;
122 0101000020BF080000CDCCCCCC2AEB2A410000000043F74A41 34.4500000000000028 27.3000000000000007 8.59999999999999964 72.5999999999999943 882069.400000000023 3534470 13271
9 0101000020BF0800009A999999786823410000000080684D41 0.349999999999999978 14.5999999999999996 8 96.5 635964.300000000047 3854592 13083
30 0101000020BF0800009A9999990ACF294100000080E5174D41 9.89000000000000057 16.5 9.5 87.2000000000000028 845701.300000000047 3813323 13119
121 0101000020BF08000000000000B67D2F4100000080AA6F4B41 14.0299999999999994 12.6999999999999993 7.59999999999999964 89.0999999999999943 1031899 3596117 13103
139 0101000020BF080000CDCCCCCCFF4B2B410000008038A54A41 25.4600000000000009 22.5 11.0999999999999996 64.5999999999999943 894463.900000000023 3492465 13069
78 0101000020BF08000066666666FB632741000000802BF44B41 34.0300000000000011 16.3000000000000007 10 63.6000000000000014 766461.699999999953 3663959 13171
103 0101000020BF0800009A999999BEB62741000000809A594B41 58.7199999999999989 29.1999999999999993 10.0999999999999996 65.5999999999999943 777055.300000000047 3584821 13193
104 0101000020BF0800009A999999E52C27410000008039874B41 43.2100000000000009 29.5 7.09999999999999964 100 759410.800000000047 3608179 13269
160 0101000020BF0800009A999999A3DC2541000000004D544A41 27.4800000000000004 22.1000000000000014 8.19999999999999929 100 716369.800000000047 3451034 13201
99 0101000020BF08000033333333DCC3294100000080D56E4B41 22.3599999999999994 18.3000000000000007 10.3000000000000007 57.8999999999999986 844270.099999999977 3595691 13023
16 0101000020BF08000033333333C4532741000000004B164D41 0.28999999999999998 12.8000000000000007 8.59999999999999964 100 764386.099999999977 3812502 13085
34 0101000020BF080000CDCCCCCCFE622441000000000FB94C41 14.3000000000000007 16.3000000000000007 6.79999999999999982 66.5 668031.400000000023 3764766 13233
37 0101000020BF08000066666666423825410000008006AC4C41 3.93999999999999995 8.80000000000000071 7.59999999999999964 93.7000000000000028 695329.199999999953 3758093 13223
47 0101000020BF08000066666666B195254100000080D0774C41 7.62999999999999989 6.59999999999999964 12 26.6999999999999993 707288.699999999953 3731361 13097
108 0101000020BF080000333333339AFA264100000000173D4B41 34.0900000000000034 19.8999999999999986 8 100 752973.099999999977 3570222 13249
75 0101000020BF0800009A9999992EEC29410000008048F74B41 42.3900000000000006 17.5 13.3000000000000007 42.7000000000000028 849431.300000000047 3665553 13009
83 0101000020BF08000066666666C39D2D4100000080E3C54B41 41.509999999999998 27.8000000000000007 7.70000000000000018 53.7999999999999972 970465.699999999953 3640263 13165
91 0101000020BF080000333333339939254100000000ABA74B41 25.4899999999999984 13.6999999999999993 13.5999999999999996 95.7999999999999972 695500.599999999977 3624790 13145
131 0101000020BF080000CDCCCCCC2AEB2A410000000043F74A41 34.4500000000000028 27.3000000000000007 8.59999999999999964 72.5999999999999943 882069.400000000023 3534470 13271
136 0101000020BF0800009A99999959562A4100000000D8DB4A41 31.3299999999999983 22 7.59999999999999964 47 863020.800000000047 3520432 13017
140 0101000020BF080000CDCCCCCCFF4B2B410000008038A54A41 25.4600000000000009 22.5 11.0999999999999996 64.5999999999999943 894463.900000000023 3492465 13069
45 0101000020BF080000333333338034294100000000B35D4C41 34.740000000000002 15 11 73 825920.099999999977 3717990 13211
58 0101000020BF080000000000003EBC27410000008062744C41 8.02999999999999936 6.20000000000000018 18.1000000000000014 59.2000000000000028 777759 3729605 13247
66 0101000020BF0800009A999999401F2D410000000063364C41 41.9600000000000009 18.1999999999999993 17.3000000000000007 9.90000000000000036 954272.300000000047 3697862 13245
67 0101000020BF080000CDCCCCCCA09B244100000080601E4C41 13.3800000000000008 19.1000000000000014 5.70000000000000018 100 675280.400000000023 3685569 13149
74 0101000020BF080000CDCCCCCC6387254100000000842F4C41 22.5899999999999999 11.4000000000000004 13.3000000000000007 76.0999999999999943 705457.900000000023 3694344 13077
112 0101000020BF0800009A999999C7662D410000008033294B41 29.1900000000000013 21.8999999999999986 6.5 79.2999999999999972 963427.800000000047 3560039 13267
117 0101000020BF0800009A9999993E6A2841000000004E314B41 48.9799999999999969 32.8999999999999986 9.5 72.5999999999999943 800031.300000000047 3564188 13093
127 0101000020BF080000CDCCCCCC209628410000008067FC4A41 40.6599999999999966 29 10 48.3999999999999986 805648.400000000023 3537103 13081
166 0101000020BF08000066666666E4462841000000800E1B4A41 37.9299999999999997 22.6000000000000014 13.4000000000000004 55.2000000000000028 795506.199999999953 3421725 13275
48 0101000020BF080000000000003EBC27410000008062744C41 8.02999999999999936 6.20000000000000018 18.1000000000000014 59.2000000000000028 777759 3729605 13247
3 0101000020BF08000033333333A0B6274100000080AD704D41 0.100000000000000006 18.3000000000000007 10.0999999999999996 100 777040.099999999977 3858779 13291
1 0101000020BF080000000000008B2A294100000080727C4D41 0.349999999999999978 13.5999999999999996 11.5999999999999996 100 824645.5 3864805 13241
2 0101000020BF080000666666663B5A284100000000C08B4D41 0 14 11.4000000000000004 100 797981.699999999953 3872640 13281
4 0101000020BF0800009A9999996F8F264100000000F67F4D41 0.0299999999999999989 17.1999999999999993 7.79999999999999982 100 739255.800000000047 3866604 13111
18 0101000020BF08000033333333C5ED234100000000C0184D41 8.60999999999999943 14.5999999999999996 5.90000000000000036 77.4000000000000057 653026.599999999977 3813760 13055
6 0101000020BF080000CDCCCCCC56F6244100000000D5694D41 4.05999999999999961 11.0999999999999996 12 70 686891.400000000023 3855274 13313
5 0101000020BF0800009A999999F499254100000000B6674D41 0.260000000000000009 11.3000000000000007 5.5 89 707834.300000000047 3854188 13213
7 0101000020BF080000CDCCCCCCCF7224410000000097774D41 0.910000000000000031 12 8.09999999999999964 43.6000000000000014 670055.900000000023 3862318 13047
8 0101000020BF080000CDCCCCCC6C1B2441000000803B504D41 3.72999999999999998 12.8000000000000007 8.40000000000000036 44.7999999999999972 658870.400000000023 3842167 13295
10 0101000020BF0800009A9999993C5C26410000008064554D41 0.260000000000000009 16.6000000000000014 8.59999999999999964 100 732702.300000000047 3844809 13123
11 0101000020BF08000033333333CAFD284100000080DD4B4D41 5.41999999999999993 11.5999999999999996 12 88.5 818917.099999999977 3839931 13137
12 0101000020BF08000033333333D3512841000000001F4E4D41 2.58999999999999986 12.5 13.5999999999999996 100 796905.599999999977 3841086 13311
13 0101000020BF08000000000000F093274100000080363D4D41 1.40999999999999992 15.3000000000000007 11.0999999999999996 78.5999999999999943 772600 3832429 13187
14 0101000020BF080000CDCCCCCCCBB2294100000080C1324D41 11.8100000000000005 17 13.0999999999999996 64.5 842085.900000000023 3827075 13257
15 0101000020BF080000333333333A382541000000801B294D41 3.7799999999999998 11.0999999999999996 9.19999999999999929 79.7000000000000028 695325.099999999977 3822135 13129
17 0101000020BF080000CDCCCCCCE235244100000000B0E94C41 13.5600000000000005 13.5999999999999996 13.6999999999999993 36.1000000000000014 662257.400000000023 3789664 13115
19 0101000020BF0800009A9999990ACF294100000080E5174D41 9.89000000000000057 16.5 9.5 87.2000000000000028 845701.300000000047 3813323 13119
20 0101000020BF080000666666662D65264100000000EE164D41 1.47999999999999998 12.8000000000000007 9 100 733846.699999999953 3812828 13227
21 0101000020BF080000CDCCCCCCBB922A4100000080FF114D41 20.4100000000000001 14.1999999999999993 9.09999999999999964 73.7999999999999972 870749.900000000023 3810303 13147
25 0101000020BF08000000000000075525410000000000F14C41 9.21000000000000085 10.6999999999999993 9 75.2000000000000028 699011.5 3793408 13015
22 0101000020BF08000000000000673E28410000000068044D41 8.48000000000000043 10.5999999999999996 15.4000000000000004 81.0999999999999943 794419.5 3803344 13139
23 0101000020BF0800009A999999EA00294100000000C00C4D41 3.49000000000000021 15.0999999999999996 6.40000000000000036 100 819317.300000000047 3807616 13011
24 0101000020BF080000CDCCCCCC41682641000000005FF24C41 1.77000000000000002 6.09999999999999964 18.3999999999999986 57.7999999999999972 734240.900000000023 3794110 13057
44 0101000020BF0800009A9999999B52244100000000E7894C41 6.46999999999999975 14.4000000000000004 7.5 67.7999999999999972 665933.800000000047 3740622 13143
26 0101000020BF0800009A999999AA5B27410000008066E84C41 0 6.79999999999999982 15.5999999999999996 93.7000000000000028 765397.300000000047 3789005 13117
27 0101000020BF080000666666666AD72A410000008068E14C41 29.9899999999999984 19.6999999999999993 8 70 879541.199999999953 3785425 13105
28 0101000020BF0800003333333312E528410000008086DE4C41 9.58000000000000007 14.0999999999999996 9 78 815753.099999999977 3783949 13157
29 0101000020BF0800009A999999FDE52941000000805EE14C41 8.32000000000000028 15.6999999999999993 9.69999999999999929 100 848638.800000000047 3785405 13195
31 0101000020BF080000CDCCCCCC4064264100000000807B4C41 49.9200000000000017 18.3999999999999986 31.6000000000000014 4.20000000000000018 733728.400000000023 3733248 13121
32 0101000020BF08000033333333359427410000000029B84C41 5.11000000000000032 4 29.6000000000000014 13.5999999999999996 772634.599999999977 3764306 13135
33 0101000020BF0800003333333346872841000000808BC24C41 11.4399999999999995 14.6999999999999993 9.19999999999999929 64.5999999999999943 803747.099999999977 3769623 13013
35 0101000020BF0800009A99999977582A410000008074A94C41 24.7399999999999984 16.1999999999999993 12.8000000000000007 100 863291.800000000047 3756777 13221
36 0101000020BF0800009A9999994D1D26410000008041AA4C41 9.83999999999999986 5.59999999999999964 33 5.79999999999999982 724646.800000000047 3757187 13067
38 0101000020BF08000033333333F9672941000000806CB54C41 26.2300000000000004 27 37.5 17.6000000000000014 832508.599999999977 3762905 13059
46 0101000020BF08000033333333F5B624410000000071544C41 15.4600000000000009 14.4000000000000004 12 68.5 678778.599999999977 3713250 13045
39 0101000020BF08000000000000B9322B4100000080C49B4C41 45.9399999999999977 22.6000000000000014 10.4000000000000004 59.6000000000000014 891228.5 3749769 13317
40 0101000020BF08000000000000C90E2C410000000039A14C41 38.1899999999999977 17.8000000000000007 8.19999999999999929 100 919396.5 3752562 13181
41 0101000020BF080000CDCCCCCC1F5A294100000080FB9D4C41 7.37000000000000011 7.90000000000000036 28.3999999999999986 95.2000000000000028 830735.900000000023 3750903 13219
42 0101000020BF080000CDCCCCCC7F2B2741000000806A7F4C41 42.2299999999999969 9.90000000000000036 32.7000000000000028 2.5 759231.900000000023 3735253 13089
43 0101000020BF0800009A999999006D284100000080F18D4C41 18.370000000000001 13.1999999999999993 9.40000000000000036 61.2000000000000028 800384.300000000047 3742691 13297
49 0101000020BF0800009A999999321C2A41000000002D664C41 49.8900000000000006 25.1000000000000014 8.80000000000000071 75.7000000000000028 855577.300000000047 3722330 13133
50 0101000020BF080000CDCCCCCC31FD2A4100000080BA5C4C41 61.3599999999999994 31.8999999999999986 5.59999999999999964 100 884376.900000000023 3717493 13265
86 0101000020BF080000000000004E6B294100000000DEA54B41 45.9299999999999997 26 4.79999999999999982 100 832935 3623868 13289
51 0101000020BF080000CDCCCCCC7FC32C4100000000BA654C41 10.9299999999999997 6.59999999999999964 23.8999999999999986 30.6000000000000014 942527.900000000023 3722100 13073
87 0101000020BF08000066666666B4C72C4100000000AD974B41 32.5799999999999983 25.6999999999999993 9.09999999999999964 64.2000000000000028 943066.199999999953 3616602 13107
52 0101000020BF080000CDCCCCCC5F352841000000001B614C41 22.3500000000000014 14.4000000000000004 9.5 76 793263.900000000023 3719734 13217
53 0101000020BF080000CDCCCCCCC5012C4100000000885A4C41 36.3800000000000026 21.6000000000000014 10.4000000000000004 65.9000000000000057 917730.900000000023 3716368 13189
55 0101000020BF080000CDCCCCCCC04C274100000000023A4C41 10.2400000000000002 6.09999999999999964 10.6999999999999993 76 763488.400000000023 3699716 13151
54 0101000020BF080000333333338E8A2B4100000000533A4C41 60.2299999999999969 32.6000000000000014 4.20000000000000018 100 902471.099999999977 3699878 13301
56 0101000020BF0800009A99999985C026410000000077514C41 23.8200000000000003 8.59999999999999964 14.6999999999999993 4.40000000000000036 745538.800000000047 3711726 13063
57 0101000020BF0800009A999999401F2D410000000063364C41 41.9600000000000009 18.1999999999999993 17.3000000000000007 9.90000000000000036 954272.300000000047 3697862 13245
59 0101000020BF080000666666669952264100000000C23B4C41 5.12999999999999989 2.60000000000000009 25.8000000000000007 53.8999999999999986 731468.699999999953 3700612 13113
60 0101000020BF08000033333333A2A2284100000000FA304C41 34.7999999999999972 17.3999999999999986 10.8000000000000007 100 807249.099999999977 3695092 13159
61 0101000020BF080000000000003EBC27410000008062744C41 8.02999999999999936 6.20000000000000018 18.1000000000000014 59.2000000000000028 777759 3729605 13247
146 0101000020BF080000CDCCCCCCA951274100000080EFA84A41 50.1499999999999986 24.3999999999999986 17 10 764116.900000000023 3494367 13095
142 0101000020BF0800009A9999994B1B2A41000000803AC04A41 30.5 27.1999999999999993 8.30000000000000071 63.3999999999999986 855461.800000000047 3506293 13155
62 0101000020BF08000033333333D2A32941000000004B314C41 32.7899999999999991 16.3999999999999986 11.6999999999999993 66.5 840169.099999999977 3695254 13237
63 0101000020BF080000CDCCCCCC6387254100000000842F4C41 22.5899999999999999 11.4000000000000004 13.3000000000000007 76.0999999999999943 705457.900000000023 3694344 13077
64 0101000020BF0800009A9999995DA82A4100000080C2264C41 79.6400000000000006 30.1000000000000014 6.79999999999999982 100 873518.800000000047 3689861 13141
65 0101000020BF08000066666666E0E02741000000004C1C4C41 35.4799999999999969 15.5999999999999996 7.20000000000000018 73.4000000000000057 782448.199999999953 3684504 13035
68 0101000020BF0800009A999999C5B82B4100000000BC1E4C41 12.6899999999999995 16.8000000000000007 5.29999999999999982 100 908386.800000000047 3685752 13125
69 0101000020BF0800003333333398332C410000000038FC4B41 55.9200000000000017 31.3000000000000007 6.20000000000000018 100 924108.099999999977 3668080 13163
114 0101000020BF0800009A999999BFF52D4100000080393F4B41 33.8800000000000026 25.3999999999999986 8.59999999999999964 100 981727.800000000047 3571315 13109
70 0101000020BF0800009A99999993A62D4100000000B1024C41 52.1899999999999977 30.3000000000000007 9.59999999999999964 72.2999999999999972 971593.800000000047 3671394 13033
71 0101000020BF08000066666666C809274100000080521D4C41 29.0799999999999983 15.5999999999999996 11.0999999999999996 53.6000000000000014 754916.199999999953 3685029 13255
72 0101000020BF0800009A999999E04D2B410000008023D64B41 51.8599999999999994 21.6000000000000014 9.80000000000000071 67.0999999999999943 894704.300000000047 3648583 13303
73 0101000020BF0800009A999999BFD4254100000080F9EC4B41 44.6199999999999974 22.3999999999999986 6.70000000000000018 82.2999999999999972 715359.800000000047 3660275 13199
76 0101000020BF0800009A999999F1D4244100000000EFEC4B41 30.0300000000000011 16.3000000000000007 13.5999999999999996 44 682616.800000000047 3660254 13285
77 0101000020BF08000066666666851E284100000000A0ED4B41 31.7800000000000011 13.8000000000000007 12.9000000000000004 75.0999999999999943 790338.699999999953 3660608 13207
79 0101000020BF08000033333333EE10294100000080B7EC4B41 25.6000000000000014 10.8000000000000007 12 81.9000000000000057 821367.099999999977 3660143 13169
80 0101000020BF0800009A999999E9B52641000000804CF74B41 20.0399999999999991 13.4000000000000004 9.30000000000000071 100 744180.800000000047 3665561 13231
92 0101000020BF080000CDCCCCCCBC002B4100000080DD754B41 33.3200000000000003 20.5 12 52.8999999999999986 884830.400000000023 3599291 13175
89 0101000020BF08000066666666B9BA2B410000000039A74B41 33.8900000000000006 22.1999999999999993 4.90000000000000036 100 908636.699999999953 3624562 13167
81 0101000020BF08000000000000F8A32E41000000001FC94B41 44.6899999999999977 22.8999999999999986 8.59999999999999964 79.2999999999999972 1004028 3641918 13251
82 0101000020BF080000CDCCCCCC59352A410000008041C14B41 41.990000000000002 15.3000000000000007 8.80000000000000071 100 858796.900000000023 3637891 13319
84 0101000020BF080000666666664AF4264100000000CCC34B41 27.7800000000000011 14.6999999999999993 9 65.2999999999999972 752165.199999999953 3639192 13293
85 0101000020BF080000CDCCCCCC11B62841000000007ABE4B41 41.6799999999999997 19.1999999999999993 17 16.1000000000000014 809736.900000000023 3636468 13021
88 0101000020BF080000CDCCCCCC13682641000000007DA44B41 62.3400000000000034 24.8999999999999986 7.09999999999999964 97.9000000000000057 734217.900000000023 3623162 13263
90 0101000020BF0800000000000071E8274100000080D7A44B41 30.6600000000000001 14 5.70000000000000018 100 783416.5 3623343 13079
98 0101000020BF080000CDCCCCCC516D2D4100000000FD744B41 30.9400000000000013 24.1000000000000014 9.90000000000000036 52.1000000000000014 964264.900000000023 3598842 13043
95 0101000020BF080000CDCCCCCCDA5A28410000008001894B41 47.5300000000000011 24 15.1999999999999993 61.2999999999999972 798061.400000000023 3609091 13225
93 0101000020BF0800009A999999E52C27410000008039874B41 43.2100000000000009 29.5 7.09999999999999964 100 759410.800000000047 3608179 13269
120 0101000020BF0800009A99999931762541000000802C1B4B41 63.4600000000000009 31.3999999999999986 8 100 703256.800000000047 3552857 13259
94 0101000020BF080000CDCCCCCCAE5C2E410000008036784B41 25.9499999999999993 27.5 19.8999999999999986 63.2000000000000028 994903.400000000023 3600493 13031
96 0101000020BF080000CDCCCCCC4DD8284100000080CC644B41 21.8000000000000007 10.5999999999999996 16 20.8999999999999986 814118.900000000023 3590553 13153
134 0101000020BF080000000000002F8327410000008050DB4A41 19.2199999999999989 12.5999999999999996 13.6999999999999993 78.2000000000000028 770455.5 3520161 13177
97 0101000020BF08000000000000B67D2F4100000080AA6F4B41 14.0299999999999994 12.6999999999999993 7.59999999999999964 89.0999999999999943 1031899 3596117 13103
100 0101000020BF080000666666664363254100000000CA734B41 37.9500000000000028 18.6000000000000014 16.6000000000000014 3.20000000000000018 700833.699999999953 3598228 13215
101 0101000020BF080000333333334B0C2C4100000000D16D4B41 33.1000000000000014 27.1000000000000014 6.29999999999999982 53.2999999999999972 919077.599999999977 3595170 13283
102 0101000020BF0800009A999999995D264100000080C4584B41 41.3200000000000003 28.1999999999999993 4.59999999999999964 100 732876.800000000047 3584393 13197
105 0101000020BF080000666666667E83254100000000844B4B41 30.9400000000000013 10.4000000000000004 20.1999999999999993 13.6999999999999993 704959.199999999953 3577608 13053
119 0101000020BF0800009A9999997053264100000000460B4B41 50.2000000000000028 22.5 5.5 100 731576.300000000047 3544716 13307
106 0101000020BF0800009A999999BEB62741000000809A594B41 58.7199999999999989 29.1999999999999993 10.0999999999999996 65.5999999999999943 777055.300000000047 3584821 13193
107 0101000020BF080000CDCCCCCC46422A4100000080863C4B41 27.6400000000000006 21.8000000000000007 8 70.7000000000000028 860451.400000000023 3569933 13091
109 0101000020BF080000333333333772294100000080AB374B41 32.4600000000000009 24.3000000000000007 10.6999999999999993 56.5 833819.599999999977 3567447 13235
110 0101000020BF080000CDCCCCCC6E1A2C4100000080AC394B41 28.2699999999999996 24.5 10.0999999999999996 98.5999999999999943 920887.400000000023 3568473 13209
111 0101000020BF080000CDCCCCCC4CBD2C4100000000F1374B41 23.379999999999999 24 11.4000000000000004 35.7000000000000028 941734.400000000023 3567586 13279
113 0101000020BF0800009A999999B06D2B4100000000BC2F4B41 30.0599999999999987 30.3000000000000007 8.59999999999999964 100 898776.300000000047 3563384 13309
133 0101000020BF08000033333333EABD25410000008045DA4A41 58.1700000000000017 35.8999999999999986 6 53.5 712437.099999999977 3519627 13243
115 0101000020BF080000000000007A2B304100000080C5224B41 38.0200000000000031 17.1999999999999993 18.6000000000000014 5.09999999999999964 1059706 3556747 13051
145 0101000020BF080000CDCCCCCC04692C410000008061B94A41 15.4199999999999999 24.1000000000000014 6.59999999999999964 61.7000000000000028 930946.400000000023 3502787 13005
116 0101000020BF0800000000000052392F4100000000531F4B41 14.8499999999999996 13.1999999999999993 11.8000000000000007 80.5999999999999943 1023145 3554982 13029
118 0101000020BF08000033333333824C27410000000004194B41 46.5300000000000011 24.8000000000000007 15.9000000000000004 45.3999999999999986 763457.099999999977 3551752 13261
123 0101000020BF08000000000000ACF72E4100000080A4FC4A41 39.1499999999999986 17.1999999999999993 13.4000000000000004 32.8999999999999986 1014742 3537225 13179
124 0101000020BF080000CDCCCCCC2AEB2A410000000043F74A41 34.4500000000000028 27.3000000000000007 8.59999999999999964 72.5999999999999943 882069.400000000023 3534470 13271
125 0101000020BF080000000000002F9729410000008039FF4A41 31.7600000000000016 28.6000000000000014 7.59999999999999964 100 838551.5 3538547 13315
128 0101000020BF080000CDCCCCCC3BF22B41000000803AF04A41 15.3599999999999994 18.8000000000000007 8.30000000000000071 65.0999999999999943 915741.900000000023 3530869 13161
126 0101000020BF0800000000000000A82E4100000000C5D64A41 21.75 23.6999999999999993 5.20000000000000018 100 1004544 3517834 13183
132 0101000020BF0800009A999999EFAC26410000000026E04A41 59.8999999999999986 29.1000000000000014 9.19999999999999929 50.2999999999999972 743031.800000000047 3522636 13273
129 0101000020BF08000033333333A9BA2C410000000072DE4A41 20.7600000000000016 19.8999999999999986 8.19999999999999929 75.5999999999999943 941396.599999999977 3521764 13001
130 0101000020BF080000CDCCCCCC36F62441000000000EE34A41 49.9299999999999997 33 7.29999999999999982 100 686875.400000000023 3524124 13239
135 0101000020BF0800009A999999A11D2E4100000080D9A84A41 19.4499999999999993 21.1999999999999993 9.59999999999999964 59.8999999999999986 986832.800000000047 3494323 13305
137 0101000020BF080000333333338F0129410000008017D14A41 40.6599999999999966 31.3000000000000007 7.20000000000000018 44.5 819399.599999999977 3514927 13287
138 0101000020BF08000033333333F471284100000000309B4A41 30.7100000000000009 26.1999999999999993 6.29999999999999982 71.0999999999999943 801018.099999999977 3487328 13321
141 0101000020BF080000CDCCCCCCFF4B2B410000008038A54A41 25.4600000000000009 22.5 11.0999999999999996 64.5999999999999943 894463.900000000023 3492465 13069
143 0101000020BF08000000000000C8722F4100000080FBB44A41 43.3400000000000034 22.3000000000000007 8.69999999999999929 100 1030500 3500535 13191
144 0101000020BF080000CDCCCCCCE33B25410000008099AA4A41 60.759999999999998 35.7000000000000028 11.1999999999999993 100 695793.900000000023 3495219 13061
147 0101000020BF080000666666660A1E26410000008096A54A41 58.8900000000000006 31.8000000000000007 10.0999999999999996 100 724741.199999999953 3492653 13037
148 0101000020BF0800009A9999998461294100000080F19B4A41 26.6799999999999997 22.8999999999999986 14 51.1000000000000014 831682.300000000047 3487715 13277
152 0101000020BF080000333333330E78254100000000C8734A41 44.0900000000000034 31.3999999999999986 9.40000000000000036 52.7999999999999972 703495.099999999977 3467152 13099
149 0101000020BF0800009A9999997B192D4100000000DE904A41 11.6899999999999995 21.3000000000000007 6.29999999999999982 74.4000000000000057 953533.800000000047 3482044 13229
150 0101000020BF08000000000000663B2F41000000806B7B4A41 25.5700000000000003 14.3000000000000007 19.8999999999999986 20.3000000000000007 1023411 3471063 13127
151 0101000020BF08000033333333DBA22C4100000080C94B4A41 25.879999999999999 21.1000000000000014 10.4000000000000004 54.2000000000000028 938349.599999999977 3446675 13299
153 0101000020BF0800009A999999173E2A410000008044724A41 11.6199999999999992 19.3000000000000007 7.5 66.2000000000000028 859915.800000000047 3466377 13019
154 0101000020BF0800000000000082542B4100000000167D4A41 26.8599999999999994 26 6.40000000000000036 100 895553 3471916 13003
155 0101000020BF080000333333336DBF264100000080A6824A41 51.6700000000000017 24.8000000000000007 9.40000000000000036 100 745398.599999999977 3474765 13007
156 0101000020BF080000333333333D62274100000000F5594A41 47.9099999999999966 28.6999999999999993 7.79999999999999982 56.2000000000000028 766238.599999999977 3453930 13205
157 0101000020BF080000CDCCCCCCB1E22D4100000080546D4A41 4.58000000000000007 18.1999999999999993 5.79999999999999982 100 979288.900000000023 3463849 13025
162 0101000020BF080000333333331CB62B4100000000FA274A41 27.2899999999999991 26.3999999999999986 6.70000000000000018 58.6000000000000014 908046.099999999977 3428340 13065
158 0101000020BF0800003333333310A129410000008057504A41 29.9400000000000013 22.3999999999999986 6.5 62 839816.099999999977 3449007 13075
159 0101000020BF0800009A999999E7AD284100000000FD5D4A41 24.1600000000000001 22.8000000000000007 10 59.3999999999999986 808691.800000000047 3455994 13071
163 0101000020BF0800009A99999998AA2A4100000080B63E4A41 26.5799999999999983 25.8999999999999986 5.40000000000000036 100 873804.300000000047 3439981 13173
161 0101000020BF08000000000000C0C62E4100000080B63A4A41 20.1900000000000013 11.5 13.5 47.1000000000000014 1008480 3437933 13039
172 0101000020BF080000000000005C43294100000000E31A4A41 41.4699999999999989 25.8999999999999986 9.09999999999999964 65.5999999999999943 827822 3421638 13027
164 0101000020BF0800009A99999968602D4100000080A0304A41 27.0500000000000007 18.3000000000000007 6.40000000000000036 100 962612.300000000047 3432769 13049
165 0101000020BF080000000000005C43294100000000E31A4A41 41.4699999999999989 25.8999999999999986 9.09999999999999964 65.5999999999999943 827822 3421638 13027
167 0101000020BF0800003333333304592741000000803C1B4A41 31.5 22.3000000000000007 7.70000000000000018 55.3999999999999986 765058.099999999977 3421817 13131
168 0101000020BF080000CDCCCCCCA85B264100000000341B4A41 39.4699999999999989 23.3000000000000007 11.6999999999999993 58 732628.400000000023 3421800 13087
169 0101000020BF08000033333333DF7F254100000000991B4A41 32.740000000000002 29.1000000000000014 7.79999999999999982 69.4000000000000057 704495.599999999977 3422002 13253
170 0101000020BF080000333333331A642A410000008058164A41 31.879999999999999 19.8999999999999986 16.3000000000000007 47.6000000000000014 864781.099999999977 3419313 13185
171 0101000020BF080000000000001C5D2B4100000000DEF24941 11.4800000000000004 14.5999999999999996 4.70000000000000018 100 896654 3401148 13101
\.
--
-- PostgreSQL database dump complete
--

View File

@@ -1,6 +1,6 @@
-- Install dependencies
CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis VERSION '2.2.2';
CREATE EXTENSION postgis;
-- Create role publicuser if it does not exist
DO

View File

@@ -0,0 +1,31 @@
-- test of Geographically Weighted Regression (GWR)
SET client_min_messages TO WARNING;
\set ECHO none
\pset format unaligned
\i test/fixtures/gwr_georgia.sql
SELECT
rowid,
round((coeffs->>'pctrural')::numeric, 4) As coeff_pctrural,
round((stand_errs->>'pctrural')::numeric, 4) As std_errs_pctrural,
round((t_vals->>'pctrural')::numeric, 4) As t_vals_pctrural,
round(predicted, 4) As predicted,
round(residuals, 4) As residuals,
round(r_squared, 4) As r_squared,
bandwidth As bandwidth
FROM
cdb_crankshaft.CDB_GWR('SELECT * FROM g_utm_testing', 'pctbach',
Array['pctrural', 'pctpov', 'pctblack']::text[],
90.0,
False,
'bisquare',
'the_geom',
'areakey')
WHERE rowid in (13001, 13027, 13039, 13231, 13321, 13293)
ORDER BY rowid ASC;
-- comparison data from known calculated values in
-- https://github.com/TaylorOshan/pysal/blob/1d6af33bda46b1d623f70912c56155064463383f/pysal/examples/georgia/georgia_BS_NN_listwise.csv
-- Note: values output from this analysis were correct with 1% of the values in that table, possibly due to projection differences.

View File

@@ -2,11 +2,11 @@ include ../../Makefile.global
# Install the package locally for development
install:
pip install --upgrade ./crankshaft
$(PIP) install --upgrade ./crankshaft
# Test develpment install
test:
nosetests crankshaft/test/
$(NOSETESTS) crankshaft/test/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
mkdir -p ../../release/python/$(EXTVERSION)
@@ -14,4 +14,4 @@ release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g' ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
deploy:
pip install $(RUN_OPTIONS) --upgrade ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
$(PIP) install $(RUN_OPTIONS) --upgrade ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)

View File

@@ -74,3 +74,12 @@ class AnalysisDataProvider:
return query_result
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)
def get_gwr_predict(self, params):
"""fetch data for gwr predict"""
query = pu.gwr_predict_query(params)
try:
query_result = plpy.execute(query)
return query_result
except plpy.SPIError, err:
plpy.error('Analysis failed: %s' % err)

View File

@@ -88,7 +88,7 @@ class Moran:
"""
params = OrderedDict([("id_col", id_col),
("attr1", numerator),
("attr2", denominator)
("attr2", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])

View File

@@ -202,6 +202,32 @@ def gwr_query(params):
GWR query
"""
replacements = {"ind_vars_select": query_attr_select(params,
table_ref=None),
"ind_vars_where": query_attr_where(params,
table_ref=None)}
query = '''
SELECT
array_agg(ST_X(ST_Centroid("{geom_col}"))) As x,
array_agg(ST_Y(ST_Centroid("{geom_col}"))) As y,
array_agg("{dep_var}") As dep_var,
%(ind_vars_select)s
array_agg("{id_col}") As rowid
FROM ({subquery}) As q
WHERE
"{dep_var}" IS NOT NULL AND
%(ind_vars_where)s
''' % replacements
return query.format(**params).strip()
def gwr_predict_query(params):
"""
GWR query
"""
replacements = {"ind_vars_select": query_attr_select(params,
table_ref=None),
"ind_vars_where": query_attr_where(params,
@@ -216,12 +242,10 @@ def gwr_query(params):
array_agg({id_col}) As rowid
FROM ({subquery}) As q
WHERE
{dep_var} IS NOT NULL AND
%(ind_vars_where)s
''' % replacements
return query.format(**params).strip()
# to add more weight methods open a ticket or pull request

View File

@@ -1,2 +1,3 @@
from crankshaft.regression.gwr import *
from crankshaft.regression.glm import *
from crankshaft.regression.gwr_cs import *

View File

@@ -149,7 +149,9 @@ class GWR(GLM):
Initialize class
"""
GLM.__init__(self, y, X, family, constant=constant)
self.constant = constant
self.sigma2_v1 = sigma2_v1
self.coords = coords
self.bw = bw
self.kernel = kernel
self.fixed = fixed
@@ -159,34 +161,25 @@ class GWR(GLM):
self.offset = offset * 1.0
self.fit_params = {}
self.W = self._build_W(fixed, kernel, coords, bw)
self.points = None
self.exog_scale = None
self.exog_resid = None
self.P = None
def _build_W(self, fixed, kernel, coords, bw, points=None):
if points is not None:
all_coords = np.vstack([coords, points])
else: all_coords = coords
if fixed:
try:
W = fk[kernel](all_coords, bw)
if points is not None:
W = self._shed(W, coords, points, bw, fk[kernel])
W = fk[kernel](coords, bw, points)
except:
raise TypeError('Unsupported kernel function ', kernel)
else:
W = ak[kernel](all_coords, bw)
#if points is not None:
#W = self._shed(W, coords, points, bw, fk[kernel])
#except:
#raise TypeError('Unsupported kernel function ', kernel)
try:
W = ak[kernel](coords, bw, points)
except:
raise TypeError('Unsupported kernel function ', kernel)
return W
def _shed(self, W, coords, points, bw, function):
W_coords = function(coords, bw)
W_points = function(points, bw)
def fit(self, ini_params=None, tol=1.0e-5, max_iter=20, solve='iwls'):
"""
Method that fits a model with a particular estimation routine.
@@ -212,17 +205,18 @@ class GWR(GLM):
self.fit_params['max_iter'] = max_iter
self.fit_params['solve']= solve
if solve.lower() == 'iwls':
params = np.zeros((self.n, self.k))
predy = np.zeros((self.n, 1))
v = np.zeros((self.n, 1))
w = np.zeros((self.n, 1))
m = self.W.shape[0]
params = np.zeros((m, self.k))
predy = np.zeros((m, 1))
v = np.zeros((m, 1))
w = np.zeros((m, 1))
z = np.zeros((self.n, self.n))
S = np.zeros((self.n, self.n))
R = np.zeros((self.n, self.n))
CCT = np.zeros((self.n, self.k))
#f = np.zeros((self.n, self.n))
p = np.zeros((self.n, 1))
for i in range(self.n):
CCT = np.zeros((m, self.k))
#f = np.zeros((n, n))
p = np.zeros((m, 1))
for i in range(m):
wi = self.W[i].reshape((-1,1))
rslt = iwls(self.y, self.X, self.family, self.offset,
ini_params, tol, max_iter, wi=wi)
@@ -237,10 +231,57 @@ class GWR(GLM):
#dont need unless f is explicitly passed for
#prediction of non-sampled points
#cf = rslt[5] - np.dot(rslt[5], f)
#CCT[i] = np.diag(np.dot(cf, cf.T/rslt[3]))
CCT[i] = np.diag(np.dot(rslt[5], rslt[5].T))
S = S * (1.0/z)
return GWRResults(self, params, predy, S, CCT, w)
def predict(self, points, P, exog_scale=None, exog_resid=None, fit_params={}):
"""
Method that predicts values of the dependent variable at un-sampled
locations
Parameters
----------
points : array-like
n*2, collection of n sets of (x,y) coordinates used for
calibration prediction locations
P : array
n*k, independent variables used to make prediction;
exlcuding the constant
exog_scale : scalar
estimated scale using sampled locations; defualt is None
which estimates a model using points from "coords"
exog_resid : array-like
estimated residuals using sampled locations; defualt is None
which estimates a model using points from "coords"; if
given it must be n*1 where n is the length of coords
fit_params : dict
key-value pairs of parameters that will be passed into fit method to define estimation
routine; see fit method for more details
"""
if (exog_scale is None) & (exog_resid is None):
train_gwr = self.fit(**fit_params)
self.exog_scale = train_gwr.scale
self.exog_resid = train_gwr.resid_response
elif (exog_scale is not None) & (exog_resid is not None):
self.exog_scale = exog_scale
self.exog_resid = exog_resid
else:
raise InputError('exog_scale and exog_resid must both either be'
'None or specified')
self.points = points
if self.constant:
P = np.hstack([np.ones((len(P),1)), P])
self.P = P
else:
self.P = P
self.W = self._build_W(self.fixed, self.kernel, self.coords, self.bw, points)
gwr = self.fit(**fit_params)
return gwr
@cache_readonly
def df_model(self):
raise NotImplementedError('Only computed for fitted model in GWRResults')
@@ -424,7 +465,7 @@ class GWRResults(GLMResults):
self.w = w
self.predy = predy
self.S = S
self.CCT = self.cov_params(CCT)
self.CCT = self.cov_params(CCT, model.exog_scale)
self._cache = {}
@cache_readonly
@@ -433,7 +474,7 @@ class GWRResults(GLMResults):
return np.dot(u, u.T)
@cache_readonly
def scale(self):
def scale(self, scale=None):
if isinstance(self.family, Gaussian):
if self.model.sigma2_v1:
scale = self.sigma2_v1
@@ -443,7 +484,7 @@ class GWRResults(GLMResults):
scale = 1.0
return scale
def cov_params(self, cov):
def cov_params(self, cov, exog_scale=None):
"""
Returns scaled covariance parameters
Parameters
@@ -456,7 +497,10 @@ class GWRResults(GLMResults):
Scaled covariance parameters
"""
return cov*self.scale
if exog_scale is not None:
return cov*exog_scale
else:
return cov*self.scale
@cache_readonly
def tr_S(self):
@@ -477,9 +521,13 @@ class GWRResults(GLMResults):
"""
weighted mean of y
"""
if self.model.points is not None:
n = len(self.model.points)
else:
n = self.n
off = self.offset.reshape((-1,1))
arr_ybar = np.zeros(shape=(self.n,1))
for i in range(self.n):
for i in range(n):
w_i= np.reshape(np.array(self.W[i]), (-1, 1))
sum_yw = np.sum(self.y.reshape((-1,1)) * w_i)
arr_ybar[i] = 1.0 * sum_yw / np.sum(w_i*off)
@@ -496,8 +544,12 @@ class GWRResults(GLMResults):
relationships.
"""
TSS = np.zeros(shape=(self.n,1))
for i in range(self.n):
if self.model.points is not None:
n = len(self.model.points)
else:
n = self.n
TSS = np.zeros(shape=(n,1))
for i in range(n):
TSS[i] = np.sum(np.reshape(np.array(self.W[i]), (-1,1)) *
(self.y.reshape((-1,1)) - self.y_bar[i])**2)
return TSS
@@ -512,11 +564,16 @@ class GWRResults(GLMResults):
Geographically weighted regression: the analysis of spatially varying
relationships.
"""
resid_response = self.resid_response.reshape((-1,1))
RSS = np.zeros(shape=(self.n,1))
for i in range(self.n):
if self.model.points is not None:
n = len(self.model.points)
resid = self.model.exog_resid.reshape((-1,1))
else:
n = self.n
resid = self.resid_response.reshape((-1,1))
RSS = np.zeros(shape=(n,1))
for i in range(n):
RSS[i] = np.sum(np.reshape(np.array(self.W[i]), (-1,1))
* resid_response**2)
* resid**2)
return RSS
@cache_readonly
@@ -694,8 +751,8 @@ class GWRResults(GLMResults):
"""
alpha = np.abs(alpha)/2.0
n = self.n
critical = stats.t.ppf(1-critical, n-1)
subset = (self.tvalues < alpha) & (self.tvalues > -1.0*alpha)
critical = t.ppf(1-alpha, n-1)
subset = (self.tvalues < critical) & (self.tvalues > -1.0*critical)
tvalues = self.tvalues.copy()
tvalues[subset] = 0
return tvalues
@@ -772,6 +829,16 @@ class GWRResults(GLMResults):
def pvalues(self):
raise NotImplementedError('Not implemented for GWR')
@cache_readonly
def predictions(self):
P = self.model.P
if P is None:
raise NotImplementedError('predictions only avaialble if predict'
'method called on GWR model')
else:
predictions = np.sum(P*self.params, axis=1).reshape((-1,1))
return predictions
class FBGWR(GWR):
"""
Parameters

View File

@@ -10,32 +10,32 @@ import numpy as np
#adaptive specifications should be parameterized with nn-1 to match original gwr
#implementation. That is, pysal counts self neighbors with knn automatically.
def fix_gauss(points, bw):
w = _Kernel(points, function='gwr_gaussian', bandwidth=bw,
truncate=False)
def fix_gauss(coords, bw, points=None):
w = _Kernel(coords, function='gwr_gaussian', bandwidth=bw,
truncate=False, points=points)
return w.kernel
def adapt_gauss(points, nn):
w = _Kernel(points, fixed=False, k=nn-1, function='gwr_gaussian',
truncate=False)
def adapt_gauss(coords, nn, points=None):
w = _Kernel(coords, fixed=False, k=nn-1, function='gwr_gaussian',
truncate=False, points=points)
return w.kernel
def fix_bisquare(points, bw):
w = _Kernel(points, function='bisquare', bandwidth=bw)
def fix_bisquare(coords, bw, points=None):
w = _Kernel(coords, function='bisquare', bandwidth=bw, points=points)
return w.kernel
def adapt_bisquare(points, nn):
w = _Kernel(points, fixed=False, k=nn-1, function='bisquare')
def adapt_bisquare(coords, nn, points=None):
w = _Kernel(coords, fixed=False, k=nn-1, function='bisquare', points=points)
return w.kernel
def fix_exp(points, bw):
w = _Kernel(points, function='exponential', bandwidth=bw,
truncate=False)
def fix_exp(coords, bw, points=None):
w = _Kernel(coords, function='exponential', bandwidth=bw,
truncate=False, points=points)
return w.kernel
def adapt_exp(points, nn):
w = _Kernel(points, fixed=False, k=nn-1, function='exponential',
truncate=False)
def adapt_exp(coords, nn, points=None):
w = _Kernel(coords, fixed=False, k=nn-1, function='exponential',
truncate=False, points=points)
return w.kernel
from scipy.spatial.distance import cdist
@@ -45,19 +45,22 @@ class _Kernel(object):
"""
def __init__(self, data, bandwidth=None, fixed=True, k=None,
function='triangular', eps=1.0000001, ids=None, truncate=True): #Added truncate flag
function='triangular', eps=1.0000001, ids=None, truncate=True,
points=None): #Added truncate flag
if issubclass(type(data), scipy.spatial.KDTree):
self.kdt = data
self.data = self.kdt.data
self.data = data.data
data = self.data
else:
self.data = data
self.kdt = KDTree(self.data)
if k is not None:
self.k = int(k) + 1
else:
self.k = k
self.dmat = cdist(self.data, self.data)
if points is None:
self.dmat = cdist(self.data, self.data)
else:
self.points = points
self.dmat = cdist(self.points, self.data)
self.function = function.lower()
self.fixed = fixed
self.eps = eps
@@ -74,12 +77,9 @@ class _Kernel(object):
self.kernel = self._kernel_funcs(self.dmat/self.bandwidth)
if self.trunc:
mask = np.repeat(self.bandwidth, len(self.bandwidth), axis=1)
kernel_mask = self._kernel_funcs(1.0/mask)
mask = np.repeat(self.bandwidth, len(self.data), axis=1)
self.kernel[(self.dmat >= mask)] = 0
def _set_bw(self):
if self.k is not None:
dmat = np.sort(self.dmat)[:,:self.k]

View File

@@ -4,9 +4,10 @@ GWR is tested against results from GWR4
import unittest
import pickle as pk
from pysal.contrib.gwr.gwr import GWR, FBGWR
from pysal.contrib.gwr.diagnostics import get_AICc, get_AIC, get_BIC, get_CV
from pysal.contrib.glm.family import Gaussian, Poisson, Binomial
from crankshaft.regression.gwr.gwr import GWR, FBGWR
from crankshaft.regression.gwr.sel_bw import Sel_BW
from crankshaft.regression.gwr.diagnostics import get_AICc, get_AIC, get_BIC, get_CV
from crankshaft.regression.glm.family import Gaussian, Poisson, Binomial
import numpy as np
import pysal
@@ -243,6 +244,73 @@ class TestGWRGaussian(unittest.TestCase):
np.testing.assert_allclose(rslt.resid_response, self.FB['u'], atol=1e-05)
np.testing.assert_almost_equal(rslt.resid_ss, 6339.3497144025841)
def test_Prediction(self):
coords =np.array(self.coords)
index = np.arange(len(self.y))
#train = index[0:-10]
test = index[-10:]
#y_train = self.y[train]
#X_train = self.X[train]
#coords_train = list(coords[train])
#y_test = self.y[test]
X_test = self.X[test]
coords_test = list(coords[test])
model = GWR(self.coords, self.y, self.X, 93, family=Gaussian(),
fixed=False, kernel='bisquare')
results = model.predict(coords_test, X_test)
params = np.array([22.77198, -0.10254, -0.215093, -0.01405,
19.10531, -0.094177, -0.232529, 0.071913,
19.743421, -0.080447, -0.30893, 0.083206,
17.505759, -0.078919, -0.187955, 0.051719,
27.747402, -0.165335, -0.208553, 0.004067,
26.210627, -0.138398, -0.360514, 0.072199,
18.034833, -0.077047, -0.260556, 0.084319,
28.452802, -0.163408, -0.14097, -0.063076,
22.353095, -0.103046, -0.226654, 0.002992,
18.220508, -0.074034, -0.309812, 0.108636]).reshape((10,4))
np.testing.assert_allclose(params, results.params, rtol=1e-03)
bse = np.array([2.080166, 0.021462, 0.102954, 0.049627,
2.536355, 0.022111, 0.123857, 0.051917,
1.967813, 0.019716, 0.102562, 0.054918,
2.463219, 0.021745, 0.110297, 0.044189,
1.556056, 0.019513, 0.12764, 0.040315,
1.664108, 0.020114, 0.131208, 0.041613,
2.5835, 0.021481, 0.113158, 0.047243,
1.709483, 0.019752, 0.116944, 0.043636,
1.958233, 0.020947, 0.09974, 0.049821,
2.276849, 0.020122, 0.107867, 0.047842]).reshape((10,4))
np.testing.assert_allclose(bse, results.bse, rtol=1e-03)
tvalues = np.array([10.947193, -4.777659, -2.089223, -0.283103,
7.532584, -4.259179, -1.877395, 1.385161,
10.033179, -4.080362, -3.012133, 1.515096,
7.106862, -3.629311, -1.704079, 1.17042,
17.831878, -8.473156, -1.633924, 0.100891,
15.750552, -6.880725, -2.74765, 1.734978,
6.980774, -3.586757, -2.302575, 1.784818,
16.644095, -8.273001, -1.205451, -1.445501,
11.414933, -4.919384, -2.272458, 0.060064,
8.00251, -3.679274, -2.872176, 2.270738]).reshape((10,4))
np.testing.assert_allclose(tvalues, results.tvalues, rtol=1e-03)
localR2 = np.array([[ 0.53068693],
[ 0.59582647],
[ 0.59700925],
[ 0.45769954],
[ 0.54634509],
[ 0.5494828 ],
[ 0.55159604],
[ 0.55634237],
[ 0.53903842],
[ 0.55884954]])
np.testing.assert_allclose(localR2, results.localR2, rtol=1e-05)
class TestGWRPoisson(unittest.TestCase):
def setUp(self):
data = pysal.open(pysal.examples.get_path('Tokyomortality.csv'), mode='Ur')

View File

@@ -2,20 +2,19 @@
Geographically weighted regression
"""
import numpy as np
from gwr.base.gwr import GWR
from gwr.base.gwr import GWR as PySAL_GWR
from gwr.base.sel_bw import Sel_BW
import plpy
import crankshaft.pysal_utils as pu
import json
from crankshaft.analysis_data_provider import AnalysisDataProvider
import plpy
class GWR:
def __init__(self, analysis_provider=None):
if analysis_provider:
self.analysis_provider = analysis_provider
def __init__(self, data_provider=None):
if data_provider:
self.data_provider = data_provider
else:
self.analysis_provider = AnalysisDataProvider()
self.data_provider = AnalysisDataProvider()
def gwr(self, subquery, dep_var, ind_vars,
bw=None, fixed=False, kernel='bisquare',
@@ -35,20 +34,24 @@ class GWR:
'dep_var': dep_var,
'ind_vars': ind_vars}
# retrieve data
query_result = self.analysis_data_provider.get_gwr(params)
# get data from data provider
query_result = self.data_provider.get_gwr(params)
# exit if data to analyze is empty
if len(query_result) == 0:
plpy.error('No data passed to analysis or independent variables '
'are all null-valued')
# unique ids and variable names list
rowid = np.array(query_result[0]['rowid'], dtype=np.int)
# TODO: should x, y be centroids? point on surface?
# lat, long coordinates
x = np.array(query_result[0]['x'], dtype=float)
y = np.array(query_result[0]['y'], dtype=float)
# x, y are centroids of input geometries
x = np.array(query_result[0]['x'], dtype=np.float)
y = np.array(query_result[0]['y'], dtype=np.float)
coords = zip(x, y)
# extract dependent variable
Y = np.array(query_result[0]['dep_var'], dtype=float).reshape((-1, 1))
Y = np.array(query_result[0]['dep_var'], dtype=np.float).reshape((-1, 1))
n = Y.shape[0]
k = len(ind_vars)
@@ -58,26 +61,27 @@ class GWR:
for attr in range(0, k):
attr_name = 'attr' + str(attr + 1)
X[:, attr] = np.array(
query_result[0][attr_name], dtype=float).flatten()
query_result[0][attr_name], dtype=np.float).flatten()
# add intercept variable name
ind_vars.insert(0, 'intercept')
# calculate bandwidth if none is supplied
plpy.notice(str(bw))
if bw is None:
bw = Sel_BW(coords, Y, X,
fixed=fixed, kernel=kernel).search()
plpy.notice(str(bw))
model = GWR(coords, Y, X, bw,
fixed=fixed, kernel=kernel).fit()
model = PySAL_GWR(coords, Y, X, bw,
fixed=fixed, kernel=kernel).fit()
# containers for outputs
coeffs = []
stand_errs = []
t_vals = []
filtered_t_vals = []
# extracted model information
c_alpha = model.adj_alpha
filtered_t = model.filter_tvals(c_alpha[1])
predicted = model.predy.flatten()
residuals = model.resid_response
r_squared = model.localR2.flatten()
@@ -91,6 +95,108 @@ class GWR:
for k, var in enumerate(ind_vars)}))
t_vals.append(json.dumps({var: model.tvalues[idx, k]
for k, var in enumerate(ind_vars)}))
filtered_t_vals.append(
json.dumps({var: filtered_t[idx, k]
for k, var in enumerate(ind_vars)}))
return zip(coeffs, stand_errs, t_vals, filtered_t_vals,
predicted, residuals, r_squared, bw, rowid)
def gwr_predict(self, subquery, dep_var, ind_vars,
bw=None, fixed=False, kernel='bisquare',
geom_col='the_geom', id_col='cartodb_id'):
"""
subquery: 'select * from demographics'
dep_var: 'pctbachelor'
ind_vars: ['intercept', 'pctpov', 'pctrural', 'pctblack']
bw: value of bandwidth, if None then select optimal
fixed: False (kNN) or True ('distance')
kernel: 'bisquare' (default), or 'exponential', 'gaussian'
"""
params = {'geom_col': geom_col,
'id_col': id_col,
'subquery': subquery,
'dep_var': dep_var,
'ind_vars': ind_vars}
# get data from data provider
query_result = self.data_provider.get_gwr_predict(params)
# exit if data to analyze is empty
if len(query_result) == 0:
plpy.error('No data passed to analysis or independent variables '
'are all null-valued')
# unique ids and variable names list
rowid = np.array(query_result[0]['rowid'], dtype=np.int)
x = np.array(query_result[0]['x'], dtype=np.float)
y = np.array(query_result[0]['y'], dtype=np.float)
coords = np.array(zip(x, y), dtype=np.float)
# extract dependent variable
Y = np.array(query_result[0]['dep_var']).reshape((-1, 1))
n = Y.shape[0]
k = len(ind_vars)
X = np.empty((n, k), dtype=np.float)
for attr in range(0, k):
attr_name = 'attr' + str(attr + 1)
X[:, attr] = np.array(
query_result[0][attr_name], dtype=np.float).flatten()
# add intercept variable name
ind_vars.insert(0, 'intercept')
# split data into "training" and "test" for predictions
# create index to split based on null y values
train = np.where(Y != np.array(None))[0]
test = np.where(Y == np.array(None))[0]
# report error if there is no data to predict
if len(test) < 1:
plpy.error('No rows flagged for prediction: verify that rows '
'denoting prediction locations have a dependent '
'variable value of `null`')
# split dependent variable (only need training which is non-Null's)
Y_train = Y[train].reshape((-1, 1))
Y_train = Y_train.astype(np.float)
# split coords
coords_train = coords[train]
coords_test = coords[test]
# split explanatory variables
X_train = X[train]
X_test = X[test]
# calculate bandwidth if none is supplied
if bw is None:
bw = Sel_BW(coords_train, Y_train, X_train,
fixed=fixed, kernel=kernel).search()
# estimate model and predict at new locations
model = PySAL_GWR(coords_train, Y_train, X_train,
bw, fixed=fixed,
kernel=kernel).predict(coords_test, X_test)
coeffs = []
stand_errs = []
t_vals = []
r_squared = model.localR2.flatten()
predicted = model.predy.flatten()
m = len(model.predy)
for idx in xrange(m):
coeffs.append(json.dumps({var: model.params[idx, k]
for k, var in enumerate(ind_vars)}))
stand_errs.append(json.dumps({var: model.bse[idx, k]
for k, var in enumerate(ind_vars)}))
t_vals.append(json.dumps({var: model.tvalues[idx, k]
for k, var in enumerate(ind_vars)}))
return zip(coeffs, stand_errs, t_vals,
predicted, residuals, r_squared, rowid, bw)
r_squared, predicted, rowid[test])

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,130 @@
import unittest
import json
import numpy as np
from crankshaft import random_seeds
from helper import fixture_file
from crankshaft.regression import GWR
from crankshaft.analysis_data_provider import AnalysisDataProvider
class FakeDataProvider(AnalysisDataProvider):
def __init__(self, mocked_result):
self.mocked_result = mocked_result
def get_gwr(self, params):
return self.mocked_result
def get_gwr_predict(self, params):
return self.mocked_result
class GWRTest(unittest.TestCase):
"""Testing class for geographically weighted regression (gwr)"""
def setUp(self):
"""
fixture packed from canonical GWR georgia dataset using the
following query:
SELECT array_agg(x) As x,
array_agg(y) As y,
array_agg(pctbach) As dep_var,
array_agg(pctrural) As attr1,
array_agg(pctpov) As attr2,
array_agg(pctblack) As attr3,
array_agg(areakey) As rowid
FROM g_utm
WHERE pctbach is not NULL AND
pctrural IS NOT NULL AND
pctpov IS NOT NULL AND
pctblack IS NOT NULL
"""
import copy
# data packed from https://github.com/TaylorOshan/pysal/blob/1d6af33bda46b1d623f70912c56155064463383f/pysal/examples/georgia/GData_utm.csv
self.data = json.loads(
open(fixture_file('gwr_packed_data.json')).read())
# data packed from https://github.com/TaylorOshan/pysal/blob/a44c5541e2e0d10a99ff05edc1b7f81b70f5a82f/pysal/examples/georgia/georgia_BS_NN_listwise.csv
self.knowns = json.loads(
open(fixture_file('gwr_packed_knowns.json')).read())
# data for GWR prediction
self.data_predict = copy.deepcopy(self.data)
self.ids_of_unknowns = [13083, 13009, 13281, 13115, 13247, 13169]
self.idx_ids_of_unknowns = [self.data_predict[0]['rowid'].index(idx)
for idx in self.ids_of_unknowns]
for idx in self.idx_ids_of_unknowns:
self.data_predict[0]['dep_var'][idx] = None
self.predicted_knowns = {13009: 10.879,
13083: 4.5259,
13115: 9.4022,
13169: 6.0793,
13247: 8.1608,
13281: 13.886}
# params, with ind_vars in same ordering as query above
self.params = {'subquery': 'select * from table',
'dep_var': 'pctbach',
'ind_vars': ['pctrural', 'pctpov', 'pctblack'],
'bw': 90.000,
'fixed': False,
'geom_col': 'the_geom',
'id_col': 'areakey'}
def test_gwr(self):
"""
"""
gwr = GWR(FakeDataProvider(self.data))
gwr_resp = gwr.gwr(self.params['subquery'],
self.params['dep_var'],
self.params['ind_vars'],
bw=self.params['bw'],
fixed=self.params['fixed'])
# unpack response
coeffs, stand_errs, t_vals, t_vals_filtered, predicteds, \
residuals, r_squareds, bws, rowids = zip(*gwr_resp)
# prepare for comparision
coeff_known_pctpov = self.knowns['est_pctpov']
tval_known_pctblack = self.knowns['t_pctrural']
pctpov_se = self.knowns['se_pctpov']
ids = self.knowns['area_key']
resp_idx = None
# test pctpov coefficient estimates
for idx, val in enumerate(coeff_known_pctpov):
resp_idx = rowids.index(ids[idx])
self.assertAlmostEquals(val,
json.loads(coeffs[resp_idx])['pctpov'],
places=4)
# test pctrural tvals
for idx, val in enumerate(tval_known_pctblack):
resp_idx = rowids.index(ids[idx])
self.assertAlmostEquals(val,
json.loads(t_vals[resp_idx])['pctrural'],
places=4)
def test_gwr_predict(self):
"""Testing for GWR_Predict"""
gwr = GWR(FakeDataProvider(self.data_predict))
gwr_resp = gwr.gwr_predict(self.params['subquery'],
self.params['dep_var'],
self.params['ind_vars'],
bw=self.params['bw'],
fixed=self.params['fixed'])
# unpack response
coeffs, stand_errs, t_vals, \
r_squareds, predicteds, rowid = zip(*gwr_resp)
threshold = 0.01
for i, idx in enumerate(self.idx_ids_of_unknowns):
known_val = self.predicted_knowns[rowid[i]]
predicted_val = predicteds[i]
test_val = abs(known_val - predicted_val) / known_val
self.assertTrue(test_val < threshold)