Compare commits

...

49 Commits

Author SHA1 Message Date
Luis Bosque
01fc2c1dd1 Release 0.0.4 2016-06-20 10:13:33 +02:00
Luis Bosque
f5fb4499db Set final dependencies versions 2016-06-20 09:44:52 +02:00
Raul Ochoa
ad5cffbf0d Merge pull request #64 from CartoDB/remove-cartodb-extension-dep
Removes cartodb-extension-dep
2016-06-16 19:19:25 +02:00
Raul Ochoa
1db938c450 Removes cartodb-extension-dep 2016-06-16 19:07:42 +02:00
Luis Bosque
3480a0d252 Allow passing options to pip install 2016-06-16 16:56:16 +02:00
Raul Ochoa
237aa1c581 Declare scipy as dep 2016-06-16 16:34:45 +02:00
Raul Ochoa
1e19f468eb Declare numpy dep 2016-06-16 16:23:43 +02:00
Raul Ochoa
8b5e910234 Release 0.0.3 2016-06-16 14:16:32 +02:00
Raul Ochoa
d08a2b6d2d Remove _cdb_crankshaft_activate_py activation call from kmeans function 2016-06-16 14:12:28 +02:00
Raul Ochoa
a222341863 Merge pull request #59 from CartoDB/kmeans
KMeans clustering and weighted centroid analysis
2016-06-16 12:39:13 +02:00
Rafa de la Torre
284d8ede44 Merge pull request #62 from CartoDB/60-remove-venv-activation
Remove virtualenv activation #60
2016-06-15 17:30:01 +02:00
Rafa de la Torre
a8943bae98 Remove reference to clean-environments #60 2016-06-14 18:27:35 +02:00
Rafa de la Torre
75531b671e Remove virtualenv references from READMEs #60 2016-06-14 18:24:43 +02:00
Rafa de la Torre
0acae8240f Remove virtualenv stuff from Makefiles #60 2016-06-14 18:23:30 +02:00
Rafa de la Torre
7b98415da3 Remove virtualenv activation #60 2016-06-14 18:06:23 +02:00
Raul Ochoa
fd1862167c Remove trailing space 2016-06-13 13:06:21 +02:00
Raul Ochoa
c870f68c77 Revert "Declare scipy as dep"
This reverts commit 1e8bc12e0a.
2016-06-13 13:05:50 +02:00
Raul Ochoa
1e8bc12e0a Declare scipy as dep 2016-06-13 12:17:46 +02:00
Raul Ochoa
b33ba2d294 Do not use names for the aggregate params 2016-06-10 18:24:43 +02:00
Raul Ochoa
889cd5c579 Fix scikit-learn dep name 2016-06-10 17:47:46 +02:00
Stuart Lynn
1a4944b960 adding sklearn as a dep 2016-06-10 13:16:16 +00:00
Stuart Lynn
9d3de5a8ef adding not null filter for geom on kmeans 2016-06-10 13:12:55 +00:00
Stuart Lynn
7f3b23f67a reworking CDB_WeightedMean to be an aggregate function 2016-06-10 13:06:49 +00:00
Raul Ochoa
cc4579461d Merge branch 'develop' into kmeans 2016-06-09 11:28:39 +02:00
Raul Ochoa
e95c40c2f9 Ignore idea based configurations 2016-06-09 11:27:15 +02:00
Ubuntu
4e86965f03 KMeans clustering and weighted centroid analysis 2016-06-07 19:58:32 +00:00
Andy Eschbacher
21dd956c15 Merge pull request #56 from CartoDB/add-issue-templates
adding template for code reviews
2016-06-03 08:25:21 -04:00
Andy Eschbacher
a0a026c2a1 Merge pull request #57 from CartoDB/adding-hot-and-cold-spot-function
adding hot/cold/outlier getters
2016-06-01 15:23:19 -04:00
Andy Eschbacher
c04e15ef81 rename some variables 2016-06-01 15:07:16 -04:00
Andy Eschbacher
ae1bb703a7 descriptions for all functions 2016-06-01 15:06:58 -04:00
Andy Eschbacher
0e3970f52c adds docs for areasofinterestlocal 2016-06-01 13:46:12 -04:00
Andy Eschbacher
59c520da16 renaming output value to vals 2016-06-01 13:40:32 -04:00
Andy Eschbacher
90c3e21c0d renaming output id to rowid 2016-06-01 12:43:33 -04:00
Andy Eschbacher
3013998e1b make functions more flexible to case of weight type 2016-06-01 12:19:08 -04:00
Andy Eschbacher
2b8adb744d switched signature to put more common options in the front 2016-06-01 12:09:43 -04:00
Andy Eschbacher
434cfcd1e9 Merge branch 'develop' into adding-hot-and-cold-spot-function 2016-06-01 11:38:46 -04:00
Andy Eschbacher
b5c6b42081 update names to align with CamelCase convention 2016-06-01 11:33:02 -04:00
Andy Eschbacher
1b0d1cc82c updating function names 2016-05-24 17:51:29 -04:00
Andy Eschbacher
fe22464b75 Merge pull request #22 from CartoDB/update-docs
Update docs format
2016-05-23 09:51:44 -04:00
Andy Eschbacher
ca5175f15b adding reference to subquery argument requirement 2016-05-20 16:26:43 -04:00
Andy Eschbacher
4e870e4393 adapt other test for new settings in fixtures 2016-05-20 15:15:41 -04:00
Andy Eschbacher
b05ad98ed9 adding tests for hot/cold/outlier for normal moran functions 2016-05-20 15:15:19 -04:00
Andy Eschbacher
bc8055a12b adds pset format unaligned to reduce output for tests 2016-05-20 14:55:16 -04:00
Andy Eschbacher
c3913459d9 adding tests for hotspot, coldspot, and outlier functions 2016-05-20 14:54:42 -04:00
Andy Eschbacher
f571e59a95 adding hot/cold/outlier getters 2016-05-20 14:05:59 -04:00
Andy Eschbacher
0400b1a880 adding template for code reviews 2016-05-20 13:23:56 -04:00
Javier Goizueta
cc4a35ebd9 Fix instructions to update/install the extension 2016-05-20 11:47:12 +02:00
Andy Eschbacher
00579cd838 adding template 2016-03-23 17:10:08 -04:00
Andy Eschbacher
3f20275d3d adopting new format (wip) 2016-03-23 17:09:52 -04:00
75 changed files with 4150 additions and 448 deletions

7
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,7 @@
- [ ] All declared geometries are `geometry(Geometry, 4326)` for general geoms, or `geometry(Point, 4326)`
- [ ] Existing functions in crankshaft python library called from the extension are kept at least from version N to version N+1 (to avoid breakage during upgrades).
- [ ] Docs for public-facing functions are written
- [ ] New functions follow the naming conventions: `CDB_NameOfFunction`. Where internal functions begin with an underscore `_`.
- [ ] If appropriate, new functions accepts an arbitrary query as an input (see [Crankshaft Issue #6](https://github.com/CartoDB/crankshaft/issues/6) for more information)

1
.gitignore vendored
View File

@@ -1,3 +1,4 @@
envs/
*.pyc
.DS_Store
.idea/

View File

@@ -45,8 +45,8 @@ source envs/dev/bin/activate
Update extension in a working database with:
* `ALTER EXTENSION crankshaft VERSION TO 'current';`
`ALTER EXTENSION crankshaft VERSION TO 'dev';`
* `ALTER EXTENSION crankshaft UPDATE TO 'current';`
`ALTER EXTENSION crankshaft UPDATE TO 'dev';`
Note: we keep the current development version install as 'dev' always;
we update through the 'current' alias to allow changing the extension
@@ -58,7 +58,9 @@ should be dropped manually before the update.
If the extension has not previously been installed in a database,
it can be installed directly with:
* `CREATE EXTENSION crankshaft WITH VERSION 'dev';`
* `CREATE EXTENSION IF NOT EXISTS plpythonu;`
`CREATE EXTENSION IF NOT EXISTS postgis;`
`CREATE EXTENSION crankshaft WITH VERSION 'dev';`
Note: the development extension uses the development python virtual
environment automatically.

View File

@@ -11,7 +11,6 @@ PYP_DIR = src/py
# Generate and install developmet versions of the extension
# and python package.
# The extension is named 'dev' with a 'current' alias for easily upgrading.
# The Python package is installed in a virtual environment envs/dev/
# Requires sudo.
install: ## Generate and install development version of the extension; requires sudo.
$(MAKE) -C $(PYP_DIR) install
@@ -29,7 +28,6 @@ release: ## Generate a new release of the extension. Only for telease manager
$(MAKE) -C $(PYP_DIR) release
# Install the current release.
# The Python package is installed in a virtual environment envs/X.Y.Z/
# Requires sudo.
# Use the RELEASE_VERSION environment variable to deploy a specific version:
# sudo make deploy RELEASE_VERSION=1.0.0
@@ -52,11 +50,7 @@ clean-release: ## clean up current release
rm -rf release/python/$(RELEASE_VERSION)
rm -f release/$(RELEASE_VERSION)--*.sql
# Cleanup all virtual environments
clean-environments: ## clean up all virtual environments
rm -rf envs/*
clean-all: clean-dev clean-release clean-environments
clean-all: clean-dev clean-release
help:
@IFS=$$'\n' ; \

10
NEWS.md
View File

@@ -1,3 +1,13 @@
0.0.4 (2016-06-20)
------------------
* Remove cartodb extension dependency from tests
* Declare all correct dependencies with correct versions in setup.py
0.0.3 (2016-06-16)
------------------
* Adds new functions: kmeans, weighted centroids.
* Replaces moran functions with new areas of interest naming.
0.0.2 (2016-03-16)
------------------
* New versioning approach using per-version Python virtual environments

View File

@@ -9,11 +9,10 @@ CartoDB Spatial Analysis extension for PostgreSQL.
* - *src/pg* contains the PostgreSQL extension source code
* - *src/py* Python module source code
* *release* reseleased versions
* *env* base directory for Python virtual environments
## Requirements
* pip, virtualenv, PostgreSQL
* pip, PostgreSQL
* python-scipy system package (see [src/py/README.md](https://github.com/CartoDB/crankshaft/blob/master/src/py/README.md))
# Working Process -- Quickstart Guide

View File

@@ -1,71 +1,185 @@
### Moran's I
## Areas of Interest Functions
#### What is Moran's I and why is it significant for CartoDB?
### CDB_AreasOfInterestLocal(subquery text, column_name text)
Moran's I is a geostatistical calculation which gives a measure of the global
clustering and presence of outliers within the geographies in a map. Here global
means over all of the geographies in a dataset. Imagine mapping the incidence
rates of cancer in neighborhoods of a city. If there were areas covering several
neighborhoods with abnormally low rates of cancer, those areas are positively
spatially correlated with one another and would be considered a cluster. If
there was a single neighborhood with a high rate but with all neighbors on
average having a low rate, it would be considered a spatial outlier.
This function classifies your data as being part of a cluster, as an outlier, or not part of a pattern based the significance of a classification. The classification happens through an autocorrelation statistic called Local Moran's I.
While Moran's I gives a global snapshot, there are local indicators for
clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
process related to autocorrelation -- i.e., a process that compares a
geography's attribute to the attribute in neighbor geographies.
#### Arguments
For the example of cancer rates in neighborhoods, since these neighborhoods have
a high value for rate of cancer, and all of their neighbors do as well, they are
designated as "High High" or simply **HH**. For areas with multiple neighborhoods
with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
naturally fit into the concept of clustering and are in the correlated
variables.
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| column_name | TEXT | Name of column (e.g., should be `'interesting_value'` instead of `interesting_value` without single quotes) used for the analysis. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
| num_ngbrs (optional) | INT | Number of neighbors if using k-nearest neighbors weight type. Defaults to 5. |
| permutations (optional) | INT | Number of permutations to check against a random arrangement of the values in `column_name`. This influences the accuracy of the output field `significance`. Defaults to 99. |
| geom_col (optional) | TEXT | The column name for the geometries. Defaults to `'the_geom'` |
| id_col (optional) | TEXT | The column name for the unique ID of each geometry/value pair. Defaults to `'cartodb_id'`. |
"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
where a geog has a high value and it's neighbors, on average, have a low value
(or vice versa). An example of this is a "gated community" or placement of a
city housing project in a rich region. These deliberate developments have
opposite median income as compared to the neighbors around them. They have a
high (or low) value while their neighbors have a low (or high) value. They exist
typically as islands, and in rare circumstances can extend as chains dividing
**LL** or **HH**.
#### Returns
Strong policies such as rent stabilization (probably) tend to prevent the
clustering of high rent areas as they integrate middle class incomes. Luxury
apartment buildings, which are a kind of gated community, probably tend to skew
an area's median income upwards while housing projects have the opposite effect.
What are the nuggets in the analysis?
A table with the following columns.
Two functions are available to compute Moran I statistics:
| Column Name | Type | Description |
|-------------|------|-------------|
| moran | NUMERIC | Value of Moran's I (spatial autocorrelation measure) for the geometry with id of `rowid` |
| quads | TEXT | Classification of geometry. Result is one of 'HH' (a high value with neighbors high on average), 'LL' (opposite of 'HH'), 'HL' (a high value surrounded by lows on average), and 'LH' (opposite of 'HL'). Null values are returned when nulls exist in the original data. |
| significance | NUMERIC | The statistical significance (from 0 to 1) of a cluster or outlier classification. Lower numbers are more significant. |
| rowid | INT | Row id of the values which correspond to the input rows. |
| vals | NUMERIC | Values from `'column_name'`. |
* `cdb_moran_local` computes Moran I measures, quad classification and
significance values from numerial values associated to geometry entities
in an input table. The geometries should be contiguous polygons When
then `queen` `w_type` is used.
* `cdb_moran_local_rate` computes the same statistics using a ratio between
numerator and denominator columns of a table.
The parameters for `cdb_moran_local` are:
#### Example Usage
* `table` name of the table that contains the data values
* `attr` name of the column
* `signficance` significance threshold for the quads values
* `num_ngbrs` number of neighbors to consider (default: 5)
* `permutations` number of random permutations for calculation of
pseudo-p values (default: 99)
* `geom_column` number of the geometry column (default: "the_geom")
* `id_col` PK column of the table (default: "cartodb_id")
* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
or "queen" for contiguity based weights.
```sql
SELECT
c.the_geom,
aoi.quads,
aoi.significance,
c.num_cyclists_per_total_population
FROM CDB_GetAreasOfInterestLocal('SELECT * FROM commute_data'
'num_cyclists_per_total_population') As aoi
JOIN commute_data As c
ON c.cartodb_id = aoi.rowid;
```
The function returns a table with the following columns:
### CDB_AreasOfInterestGlobal(subquery text, column_name text)
* `moran` Moran's value
* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
* `significance` significance value
* `ids` id of the corresponding record in the input table
This function identifies the extent to which geometries cluster (the groupings of geometries with similarly high or low values relative to the mean) or form outliers (areas where geometries have values opposite of their neighbors). The output of this function gives values between -1 and 1 as well as a significance of that classification. Values close to 0 mean that there is little to no distribution of values as compared to what one would see in a randomly distributed collection of geometries and values.
Function `cdb_moran_local_rate` only differs in that the `attr` input
parameter is substituted by `numerator` and `denominator`.
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| column_name | TEXT | Name of column (e.g., should be `'interesting_value'` instead of `interesting_value` without single quotes) used for the analysis. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
| num_ngbrs (optional) | INT | Number of neighbors if using k-nearest neighbors weight type. Defaults to 5. |
| permutations (optional) | INT | Number of permutations to check against a random arrangement of the values in `column_name`. This influences the accuracy of the output field `significance`. Defaults to 99. |
| geom_col (optional) | TEXT | The column name for the geometries. Defaults to `'the_geom'` |
| id_col (optional) | TEXT | The column name for the unique ID of each geometry/value pair. Defaults to `'cartodb_id'`. |
#### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| moran | NUMERIC | Value of Moran's I (spatial autocorrelation measure) for the entire dataset. Values closer to one indicate cluster, closer to -1 mean more outliers, and near zero indicates a random distribution of data. |
| significance | NUMERIC | The statistical significance of the `moran` measure. |
#### Examples
```sql
SELECT *
FROM CDB_AreasOfInterestGlobal('SELECT * FROM commute_data', 'num_cyclists_per_total_population')
```
### CDB_AreasOfInterestLocalRate(subquery text, numerator_column text, denominator_column text)
Just like `CDB_AreasOfInterestLocal`, this function classifies your data as being part of a cluster, as an outlier, or not part of a pattern based the significance of a classification. This function differs in that it calculates the classifications based on input `numerator` and `denominator` columns for finding the areas where there are clusters and outliers for the resulting rate of those two values.
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| numerator | TEXT | Name of the numerator for forming a rate to be used in analysis. |
| denominator | TEXT | Name of the denominator for forming a rate to be used in analysis. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
| num_ngbrs (optional) | INT | Number of neighbors if using k-nearest neighbors weight type. Defaults to 5. |
| permutations (optional) | INT | Number of permutations to check against a random arrangement of the values in `column_name`. This influences the accuracy of the output field `significance`. Defaults to 99. |
| geom_col (optional) | TEXT | The column name for the geometries. Defaults to `'the_geom'` |
| id_col (optional) | TEXT | The column name for the unique ID of each geometry/value pair. Defaults to `'cartodb_id'`. |
#### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| moran | NUMERIC | Value of Moran's I (spatial autocorrelation measure) for the geometry with id of `rowid` |
| quads | TEXT | Classification of geometry. Result is one of 'HH' (a high value with neighbors high on average), 'LL' (opposite of 'HH'), 'HL' (a high value surrounded by lows on average), and 'LH' (opposite of 'HL'). Null values are returned when nulls exist in the original data. |
| significance | NUMERIC | The statistical significance (from 0 to 1) of a cluster or outlier classification. Lower numbers are more significant. |
| rowid | INT | Row id of the values which correspond to the input rows. |
| vals | NUMERIC | Values from `'column_name'`. |
#### Example Usage
```sql
SELECT
c.the_geom,
aoi.quads,
aoi.significance,
c.cyclists_per_total_population
FROM CDB_GetAreasOfInterestLocalRate('SELECT * FROM commute_data'
'num_cyclists',
'total_population') As aoi
JOIN commute_data As c
ON c.cartodb_id = aoi.rowid;
```
### CDB_AreasOfInterestGlobalRate(subquery text, column_name text)
This function identifies the extent to which geometries cluster (the groupings of geometries with similarly high or low values relative to the mean) or form outliers (areas where geometries have values opposite of their neighbors). The output of this function gives values between -1 and 1 as well as a significance of that classification. Values close to 0 mean that there is little to no distribution of values as compared to what one would see in a randomly distributed collection of geometries and values.
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| numerator | TEXT | Name of the numerator for forming a rate to be used in analysis. |
| denominator | TEXT | Name of the denominator for forming a rate to be used in analysis. |
| weight type (optional) | TEXT | Type of weight to use when finding neighbors. Currently available options are 'knn' (default) and 'queen'. Read more about weight types in [PySAL's weights documentation](https://pysal.readthedocs.io/en/v1.11.0/users/tutorials/weights.html). |
| num_ngbrs (optional) | INT | Number of neighbors if using k-nearest neighbors weight type. Defaults to 5. |
| permutations (optional) | INT | Number of permutations to check against a random arrangement of the values in `column_name`. This influences the accuracy of the output field `significance`. Defaults to 99. |
| geom_col (optional) | TEXT | The column name for the geometries. Defaults to `'the_geom'` |
| id_col (optional) | TEXT | The column name for the unique ID of each geometry/value pair. Defaults to `'cartodb_id'`. |
#### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| moran | NUMERIC | Value of Moran's I (spatial autocorrelation measure) for the entire dataset. Values closer to one indicate cluster, closer to -1 mean more outliers, and near zero indicates a random distribution of data. |
| significance | NUMERIC | The statistical significance of the `moran` measure. |
#### Examples
```sql
SELECT *
FROM CDB_AreasOfInterestGlobalRate('SELECT * FROM commute_data',
'num_cyclists',
'total_population')
```
## Hotspot, Coldspot, and Outlier Functions
These functions are convenience functions for extracting only information that you are interested in exposing based on the outputs of the `CDB_AreasOfInterest` functions. For instance, you can use `CDB_GetSpatialHotspots` to output only the classifications of `HH` and `HL`.
### Non-rate functions
#### CDB_GetSpatialHotspots
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocal` except that the outputs are filtered to be only 'HH' and 'HL' (areas of high values). For more information about this function's use, see `CDB_AreasOfInterestLocal`.
#### CDB_GetSpatialColdspots
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocal` except that the outputs are filtered to be only 'LL' and 'LH' (areas of low values). For more information about this function's use, see `CDB_AreasOfInterestLocal`.
#### CDB_GetSpatialOutliers
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocal` except that the outputs are filtered to be only 'HL' and 'LH' (areas where highs or lows are surrounded by opposite values on average). For more information about this function's use, see `CDB_AreasOfInterestLocal`.
### Rate functions
#### CDB_GetSpatialHotspotsRate
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocalRate` except that the outputs are filtered to be only 'HH' and 'HL' (areas of high values). For more information about this function's use, see `CDB_AreasOfInterestLocalRate`.
#### CDB_GetSpatialColdspotsRate
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocalRate` except that the outputs are filtered to be only 'LL' and 'LH' (areas of low values). For more information about this function's use, see `CDB_AreasOfInterestLocalRate`.
#### CDB_GetSpatialOutliersRate
This function's inputs and outputs exactly mirror `CDB_AreasOfInterestLocalRate` except that the outputs are filtered to be only 'HL' and 'LH' (areas where highs or lows are surrounded by opposite values on average). For more information about this function's use, see `CDB_AreasOfInterestLocalRate`.

62
doc/11_kmeans.md Normal file
View File

@@ -0,0 +1,62 @@
## K-Means Functions
### CDB_KMeans(subquery text, no_clusters INTEGER)
This function attempts to find n clusters within the input data. It will return a table to CartoDB ids and
the number of the cluster each point in the input was assigend to.
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column name `the_geom` and id column name `cartodb_id` unless otherwise specified in the input arguments |
| no\_clusters | INTEGER | The number of clusters to try and find |
#### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| cartodb\_id | INTEGER | The CartoDB id of the row in the input table.|
| cluster\_no | INTEGER | The cluster that this point belongs to. |
#### Example Usage
```sql
SELECT
customers.*,
km.cluster_no
FROM cdb_crankshaft.CDB_Kmeans('SELECT * from customers' , 6) km, customers_3
WHERE customers.cartodb_id = km.cartodb_id
```
### CDB_WeightedMean(subquery text, weight_column text, category_column text)
Function that computes the weighted centroid of a number of clusters by some weight column.
### Arguments
| Name | Type | Description |
|------|------|-------------|
| subquery | TEXT | SQL query that exposes the data to be analyzed (e.g., `SELECT * FROM interesting_table`). This query must have the geometry column and the columns specified as the weight and category columns|
| weight\_column | TEXT | The name of the column to use as a weight |
| category\_column | TEXT | The name of the column to use as a category |
### Returns
A table with the following columns.
| Column Name | Type | Description |
|-------------|------|-------------|
| the\_geom | GEOMETRY | A point for the weighted cluster center |
| class | INTEGER | The cluster class |
### Example Usage
```sql
SELECT ST_TRANSFORM(the_geom, 3857) as the_geom_webmercator, class
FROM cdb_weighted_mean('SELECT *, customer_value FROM customers','customer_value','cluster_no')
```

24
doc/docs_template.md Normal file
View File

@@ -0,0 +1,24 @@
## Name
## Synopsis
## Description
Availability: v...
## Examples
```SQL
-- example of the function in use
SELECT cdb_awesome_function(the_geom, 'total_pop')
FROM table_name
```
## API Usage
_asdf_
## See Also
_Other function pages_

View File

@@ -0,0 +1,413 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- [MANUALLY] DROP FUNCTIONS REMOVED SINCE 0.0.2 version
DROP FUNCTION IF EXISTS cdb_moran_local(TEXT, TEXT, float, INT, INT, TEXT, TEXT, TEXT);
DROP FUNCTION IF EXISTS cdb_moran_local_rate(TEXT, TEXT, TEXT, FLOAT, INT, INT, TEXT, TEXT, TEXT);
DROP FUNCTION IF EXISTS _cdb_crankshaft_virtualenvs_path();
DROP FUNCTION IF EXISTS _cdb_crankshaft_activate_py();
-- [END MANUALLY] DROP FUNCTIONS REMOVED SINCE 0.0.2 version
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.3'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC)(
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -0,0 +1,209 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- [MANUALLY] DROP FUNCTIONS INTRODUCED IN 0.0.3 version
DROP FUNCTION IF EXISTS CDB_AreasOfInterestGlobal(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS _CDB_AreasOfInterestLocal(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_AreasOfInterestLocal(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialHotspots(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialColdspots(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialOutliers(TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_AreasOfInterestGlobalRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_AreasOfInterestLocalRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS _CDB_AreasOfInterestLocalRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialHotspotsRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialColdspotsRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_GetSpatialOutliersRate(TEXT,TEXT,TEXT,TEXT,INT,INT,TEXT,TEXT);
DROP FUNCTION IF EXISTS CDB_KMeans(text,integer,integer);
DROP AGGREGATE IF EXISTS CDB_WeightedMean(geometry(Point, 4326), NUMERIC);
DROP FUNCTION IF EXISTS CDB_WeightedMeanS(Numeric[], GEOMETRY(Point, 4326), NUMERIC);
DROP FUNCTION IF EXISTS CDB_WeightedMeanF(Numeric[]);
-- [END MANUALLY] DROP FUNCTIONS INTRODUCED IN 0.0.3 version
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.2'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
-- RETURN '/opt/virtualenvs/crankshaft';
RETURN '/home/ubuntu/crankshaft/envs';
END;
$$ language plpgsql IMMUTABLE STRICT;
-- Use the crankshaft python module
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate
CREATE OR REPLACE FUNCTION
cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -0,0 +1,8 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.4'::text;
$$ language 'sql' STABLE STRICT;

View File

@@ -0,0 +1,403 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.3'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC)(
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -0,0 +1,8 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.3'::text;
$$ language 'sql' STABLE STRICT;

View File

@@ -0,0 +1,403 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.4'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC)(
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
default_version = '0.0.4'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1,2 @@
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -0,0 +1,260 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
import plpy
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
qvals = {"id_col": id_col,
"attr1": attr_name,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
attr_vals = pu.get_attributes(result)
## calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(5)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
qvals = {"id_col": id_col,
"attr1": numerator,
"attr2": denominator,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
query = pu.construct_neighbor_query(w_type,
{"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs})
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(5)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find units of significance
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_local_bv(subquery, attr1, attr2,
permutations, geom_col, id_col, w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"subquery": subquery,
"geom_col": geom_col,
"id_col": id_col}
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
except plpy.SPIError:
plpy.error("Error: areas of interest query failed, " \
"check input parameters")
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(4)
## collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = quad_position(lisa.q)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1 @@
from pysal_utils import *

View File

@@ -0,0 +1,152 @@
"""
Utilities module for generic PySAL functionality, mainly centered on translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
## Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res: query results with attributes and neighbors
"""
if w_type.lower() == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
else:
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
@param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.3',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.9.1', 'scikit-learn==0.17.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -0,0 +1,83 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,107 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(pu.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND " \
"idx_replace.\"{attr2}\" IS NOT NULL AND " \
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(pu.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0 " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params), ans)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
pu.knn(self.params))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1,2 @@
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -0,0 +1,260 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
import plpy
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
qvals = {"id_col": id_col,
"attr1": attr_name,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
attr_vals = pu.get_attributes(result)
## calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(5)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
qvals = {"id_col": id_col,
"attr1": numerator,
"attr2": denominator,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
query = pu.construct_neighbor_query(w_type,
{"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs})
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(5)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find units of significance
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_local_bv(subquery, attr1, attr2,
permutations, geom_col, id_col, w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"subquery": subquery,
"geom_col": geom_col,
"id_col": id_col}
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
except plpy.SPIError:
plpy.error("Error: areas of interest query failed, " \
"check input parameters")
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(4)
## collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = quad_position(lisa.q)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1 @@
from pysal_utils import *

View File

@@ -0,0 +1,152 @@
"""
Utilities module for generic PySAL functionality, mainly centered on translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
## Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res: query results with attributes and neighbors
"""
if w_type.lower() == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
else:
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
@param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.4',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -0,0 +1,83 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,107 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(pu.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND " \
"idx_replace.\"{attr2}\" IS NOT NULL AND " \
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(pu.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0 " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params), ans)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
pu.knn(self.params))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -7,7 +7,6 @@ include ../../Makefile.global
# requires sudo. In additionof the current development version
# named 'dev', an alias 'current' is generating for ease of
# update (upgrade to 'current', then to 'dev').
# the python module is installed in a virtualenv in envs/dev/
# * test runs the tests for the currently generated Development
# extension.
@@ -18,11 +17,8 @@ DATA = $(EXTENSION)--dev.sql \
SOURCES_DATA_DIR = sql
SOURCES_DATA = $(wildcard $(SOURCES_DATA_DIR)/*.sql)
VIRTUALENV_PATH = $(realpath ../../envs)
ESC_VIRVIRTUALENV_PATH = $(subst /,\/,$(VIRTUALENV_PATH))
REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g' \
-e 's/@@VIRTUALENV_PATH@@/$(ESC_VIRVIRTUALENV_PATH)/g'
REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g'
$(DATA): $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > $@
@@ -54,7 +50,6 @@ release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > ../../release/$(EXTENSION)--$(EXTVERSION).sql
# Install the current relese into the PostgreSQL extensions directory
# and the Python package in a virtual environment envs/X.Y.Z
deploy:
$(INSTALL_DATA) ../../release/$(EXTENSION).control '$(DESTDIR)$(datadir)/extension/'
$(INSTALL_DATA) ../../release/*.sql '$(DESTDIR)$(datadir)/extension/'

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
default_version = '0.0.4'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -1,23 +0,0 @@
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
-- RETURN '/opt/virtualenvs/crankshaft';
RETURN '@@VIRTUALENV_PATH@@';
END;
$$ language plpgsql IMMUTABLE STRICT;
-- Use the crankshaft python module
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;

View File

@@ -4,7 +4,6 @@
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;

View File

@@ -1,89 +1,229 @@
-- Moran's I (global)
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterest_Global (
CDB_AreasOfInterestGlobal(
subquery TEXT,
attr_name TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5)
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, attr, num_ngbrs, permutations, geom_col, id_col, w_type)
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterest_Local(
_CDB_AreasOfInterestLocal(
subquery TEXT,
attr TEXT,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, attr, permutations, geom_col, id_col, w_type, num_ngbrs)
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Rate (global)
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterest_Global_Rate(
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5)
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterest_Local_Rate(
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5)
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- -- Moran's I Local Bivariate
-- CREATE OR REPLACE FUNCTION
-- cdb_moran_local_bv(
-- subquery TEXT,
-- attr1 TEXT,
-- attr2 TEXT,
-- permutations INT DEFAULT 99,
-- geom_col TEXT DEFAULT 'the_geom',
-- id_col TEXT DEFAULT 'cartodb_id',
-- w_type TEXT DEFAULT 'knn',
-- num_ngbrs INT DEFAULT 5)
-- RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
-- AS $$
-- from crankshaft.clustering import moran_local_bv
-- # TODO: use named parameters or a dictionary
-- return moran_local_bv(t, attr1, attr2, permutations, geom_col, id_col, w_type, num_ngbrs)
-- $$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;

49
src/pg/sql/11_kmeans.sql Normal file
View File

@@ -0,0 +1,49 @@
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC)(
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);

View File

@@ -1,6 +1,5 @@
-- Install dependencies
CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis;
CREATE EXTENSION cartodb;
-- Install the extension
CREATE EXTENSION crankshaft VERSION 'dev';

View File

@@ -1,248 +1,277 @@
\pset format unaligned
\set ECHO all
\i test/fixtures/ppoints.sql
-- test table (spanish province centroids with some invented values)
CREATE TABLE ppoints (cartodb_id integer, the_geom geometry, the_geom_webmercator geometry, code text, region_code text, value float);
INSERT INTO ppoints VALUES
( 1,'0101000020E6100000A8306DC0CBC305C051D14B6CE56A4540'::geometry,ST_Transform('0101000020E6100000A8306DC0CBC305C051D14B6CE56A4540'::geometry, 3857),'01','16',0.5),
( 4,'0101000020E6100000E220A4362DC202C0FD8AFA5119994240'::geometry,ST_Transform('0101000020E6100000E220A4362DC202C0FD8AFA5119994240'::geometry, 3857),'04','01',0.1),
( 5,'0101000020E610000004377E573AC813C0CB5871BB17494440'::geometry,ST_Transform('0101000020E610000004377E573AC813C0CB5871BB17494440'::geometry, 3857),'05','07',0.3),
( 2,'0101000020E610000000F49BE19BAFFFBF639958FDA6694340'::geometry,ST_Transform('0101000020E610000000F49BE19BAFFFBF639958FDA6694340'::geometry, 3857),'02','08',0.7),
( 3,'0101000020E61000005D0B7E63C832E2BFDB63EB00443D4340'::geometry,ST_Transform('0101000020E61000005D0B7E63C832E2BFDB63EB00443D4340'::geometry, 3857),'03','10',0.2),
( 6,'0101000020E61000006F3742B7FB9018C0DD967DC4D95A4340'::geometry,ST_Transform('0101000020E61000006F3742B7FB9018C0DD967DC4D95A4340'::geometry, 3857),'06','11',0.05),
( 7,'0101000020E6100000E4BB36995F4C0740EAC0E5CA9FC94340'::geometry,ST_Transform('0101000020E6100000E4BB36995F4C0740EAC0E5CA9FC94340'::geometry, 3857),'07','04',0.4),
( 8,'0101000020E61000003D43CC6CAFBEFF3F6B52E66F91DD4440'::geometry,ST_Transform('0101000020E61000003D43CC6CAFBEFF3F6B52E66F91DD4440'::geometry, 3857),'08','09',0.7),
( 9,'0101000020E61000003CC797BD99AF0CC0495A87FA312F4540'::geometry,ST_Transform('0101000020E61000003CC797BD99AF0CC0495A87FA312F4540'::geometry, 3857),'09','07',0.5),
(13,'0101000020E61000001CAA00A9F19F0EC05DF9267B7A764340'::geometry,ST_Transform('0101000020E61000001CAA00A9F19F0EC05DF9267B7A764340'::geometry, 3857),'13','08',0.4),
(16,'0101000020E6100000D8208F3CBC9001C065638DC1B1F24340'::geometry,ST_Transform('0101000020E6100000D8208F3CBC9001C065638DC1B1F24340'::geometry, 3857),'16','08',0.4),
(17,'0101000020E6100000E9E6A94A71630540AD7A0CB062104540'::geometry,ST_Transform('0101000020E6100000E9E6A94A71630540AD7A0CB062104540'::geometry, 3857),'17','09',0.6),
(18,'0101000020E6100000719792D59E240AC098AC548E00A84240'::geometry,ST_Transform('0101000020E6100000719792D59E240AC098AC548E00A84240'::geometry, 3857),'18','01',0.3),
(19,'0101000020E6100000972C878B50FD04C0123C881D1F684440'::geometry,ST_Transform('0101000020E6100000972C878B50FD04C0123C881D1F684440'::geometry, 3857),'19','08',0.7),
(21,'0101000020E6100000F7893E9934511BC0EAA4BF03E1C94240'::geometry,ST_Transform('0101000020E6100000F7893E9934511BC0EAA4BF03E1C94240'::geometry, 3857),'21','01',0.1),
(22,'0101000020E6100000572C2123B2A8B2BF7ED7FABAFD194540'::geometry,ST_Transform('0101000020E6100000572C2123B2A8B2BF7ED7FABAFD194540'::geometry, 3857),'22','02',0.4),
(25,'0101000020E6100000461B67D688C4F03FD990EEC3A0054540'::geometry,ST_Transform('0101000020E6100000461B67D688C4F03FD990EEC3A0054540'::geometry, 3857),'25','09',0.4),
(26,'0101000020E6100000A139FB06E82204C0539D84F62E234540'::geometry,ST_Transform('0101000020E6100000A139FB06E82204C0539D84F62E234540'::geometry, 3857),'26','17',0.6),
(27,'0101000020E6100000A92E54E618C91DC00D3A947B81814540'::geometry,ST_Transform('0101000020E6100000A92E54E618C91DC00D3A947B81814540'::geometry, 3857),'27','12',0.3),
(28,'0101000020E6100000971DC8B682BC0DC016D0E8055F3F4440'::geometry,ST_Transform('0101000020E6100000971DC8B682BC0DC016D0E8055F3F4440'::geometry, 3857),'28','13',0.8),
(30,'0101000020E6100000A2DC1964A8C5F7BF19299C994D004340'::geometry,ST_Transform('0101000020E6100000A2DC1964A8C5F7BF19299C994D004340'::geometry, 3857),'30','14',0.1),
(31,'0101000020E6100000DCA1FCC87B56FABF9B88E9D866554540'::geometry,ST_Transform('0101000020E6100000DCA1FCC87B56FABF9B88E9D866554540'::geometry, 3857),'31','15',0.9),
(32,'0101000020E6100000E1517AFCD15E1EC0A18D8D4825194540'::geometry,ST_Transform('0101000020E6100000E1517AFCD15E1EC0A18D8D4825194540'::geometry, 3857),'32','12',0.3),
(33,'0101000020E6100000A7FF33825AF917C0FABE7DFB6BA54540'::geometry,ST_Transform('0101000020E6100000A7FF33825AF917C0FABE7DFB6BA54540'::geometry, 3857),'33','03',0.4),
(34,'0101000020E6100000FB4E4EBEB72412C0898E7240982F4540'::geometry,ST_Transform('0101000020E6100000FB4E4EBEB72412C0898E7240982F4540'::geometry, 3857),'34','07',0.3),
(35,'0101000020E6100000224682B01B1A2DC011091656CC5C3C40'::geometry,ST_Transform('0101000020E6100000224682B01B1A2DC011091656CC5C3C40'::geometry, 3857),'35','05',0.3),
(36,'0101000020E6100000F7C9447110EC20C04C5D4823C7374540'::geometry,ST_Transform('0101000020E6100000F7C9447110EC20C04C5D4823C7374540'::geometry, 3857),'36','12',0.2),
(37,'0101000020E610000053D6A26DFB4218C09D58FAE209674440'::geometry,ST_Transform('0101000020E610000053D6A26DFB4218C09D58FAE209674440'::geometry, 3857),'37','07',0.5),
(38,'0101000020E6100000B1D1B5FC910431C03C0C89BA03503C40'::geometry,ST_Transform('0101000020E6100000B1D1B5FC910431C03C0C89BA03503C40'::geometry, 3857),'38','05',0.4),
(39,'0101000020E610000086E6FEE1BD1E10C00417096748994540'::geometry,ST_Transform('0101000020E610000086E6FEE1BD1E10C00417096748994540'::geometry, 3857),'39','06',0.6),
(40,'0101000020E6100000FB51C33F733710C038D01729E4954440'::geometry,ST_Transform('0101000020E6100000FB51C33F733710C038D01729E4954440'::geometry, 3857),'40','07',0.5),
(41,'0101000020E6100000912D6FDA28BB16C031321F08C4B74240'::geometry,ST_Transform('0101000020E6100000912D6FDA28BB16C031321F08C4B74240'::geometry, 3857),'41','01',0.4),
(42,'0101000020E6100000554432EABEB504C069ECD78775CF4440'::geometry,ST_Transform('0101000020E6100000554432EABEB504C069ECD78775CF4440'::geometry, 3857),'42','07',0.2),
(43,'0101000020E6100000157F117C1A2EEA3F027CD1F2368B4440'::geometry,ST_Transform('0101000020E6100000157F117C1A2EEA3F027CD1F2368B4440'::geometry, 3857),'43','09',0.3),
(44,'0101000020E610000051AA5B1BD718EABFEE67613BA4544440'::geometry,ST_Transform('0101000020E610000051AA5B1BD718EABFEE67613BA4544440'::geometry, 3857),'44','02',0.2),
(45,'0101000020E610000022C5C01BB69710C08563BC1499E54340'::geometry,ST_Transform('0101000020E610000022C5C01BB69710C08563BC1499E54340'::geometry, 3857),'45','08',0.3),
(46,'0101000020E6100000D5FCF78A11A0E9BFDEA46F8E64AF4340'::geometry,ST_Transform('0101000020E6100000D5FCF78A11A0E9BFDEA46F8E64AF4340'::geometry, 3857),'46','10',0.2),
(47,'0101000020E61000003AE63525866313C02100050B2BD14440'::geometry,ST_Transform('0101000020E61000003AE63525866313C02100050B2BD14440'::geometry, 3857),'47','07',0.3),
(48,'0101000020E610000030F187FD1FD206C0C767E1496C9E4540'::geometry,ST_Transform('0101000020E610000030F187FD1FD206C0C767E1496C9E4540'::geometry, 3857),'48','16',0.5),
(49,'0101000020E61000009C22867B12EC17C006C5F40C14DD4440'::geometry,ST_Transform('0101000020E61000009C22867B12EC17C006C5F40C14DD4440'::geometry, 3857),'49','07',0.2),
(50,'0101000020E6100000F7D5EFC62D08F1BF69D1231D68CF4440'::geometry,ST_Transform('0101000020E6100000F7D5EFC62D08F1BF69D1231D68CF4440'::geometry, 3857),'50','02',0.6),
(51,'0101000020E61000005B0E1F8DAA5F15C0530BFE285BF24140'::geometry,ST_Transform('0101000020E61000005B0E1F8DAA5F15C0530BFE285BF24140'::geometry, 3857),'51','18',0.01),
(10,'0101000020E61000000FD65D82AEA418C06192D1351FDB4340'::geometry,ST_Transform('0101000020E61000000FD65D82AEA418C06192D1351FDB4340'::geometry, 3857),'10','11',0.04),
(11,'0101000020E6100000B305531DAB0A17C0DEAFCD4EE5464240'::geometry,ST_Transform('0101000020E6100000B305531DAB0A17C0DEAFCD4EE5464240'::geometry, 3857),'11','01',0.08),
(12,'0101000020E610000059721A7297C9C2BF9EBE383BE51E4440'::geometry,ST_Transform('0101000020E610000059721A7297C9C2BF9EBE383BE51E4440'::geometry, 3857),'12','10',0.2),
(14,'0101000020E610000000C86313AF3C13C0E530879C10FF4240'::geometry,ST_Transform('0101000020E610000000C86313AF3C13C0E530879C10FF4240'::geometry, 3857),'14','01',0.2),
(15,'0101000020E61000002A475497B6ED20C06643D4131A904540'::geometry,ST_Transform('0101000020E61000002A475497B6ED20C06643D4131A904540'::geometry, 3857),'15','12',0.3),
(20,'0101000020E6100000F975566FAD8D01C0E840C33F67924540'::geometry,ST_Transform('0101000020E6100000F975566FAD8D01C0E840C33F67924540'::geometry, 3857),'20','16',0.8),
(23,'0101000020E610000025FA13E595880BC022BB07131D024340'::geometry,ST_Transform('0101000020E610000025FA13E595880BC022BB07131D024340'::geometry, 3857),'23','01',0.1),
(24,'0101000020E61000009C5F91C5095C17C0C78784B15A4F4540'::geometry,ST_Transform('0101000020E61000009C5F91C5095C17C0C78784B15A4F4540'::geometry, 3857),'24','07',0.3),
(29,'0101000020E6100000C34D4A5B48E712C092E680892C684240'::geometry,ST_Transform('0101000020E6100000C34D4A5B48E712C092E680892C684240'::geometry, 3857),'29','01',0.3),
(52,'0101000020E6100000406A545EB29A07C04E5F0BDA39A54140'::geometry,ST_Transform('0101000020E6100000406A545EB29A07C04E5F0BDA39A54140'::geometry, 3857),'52','19',0.01)
\i test/fixtures/ppoints2.sql
-- test table (spanish province centroids with some invented values)
CREATE TABLE ppoints2 (cartodb_id integer, the_geom geometry, code text, region_code text, numerator float, denominator float);
INSERT INTO ppoints2 VALUES
( 1,'0101000020E6100000A8306DC0CBC305C051D14B6CE56A4540'::geometry,'01','16',0.5, 1.0),
( 4,'0101000020E6100000E220A4362DC202C0FD8AFA5119994240'::geometry,'04','01',0.1, 1.0),
( 5,'0101000020E610000004377E573AC813C0CB5871BB17494440'::geometry,'05','07',0.3, 1.0),
( 2,'0101000020E610000000F49BE19BAFFFBF639958FDA6694340'::geometry,'02','08',0.7, 1.0),
( 3,'0101000020E61000005D0B7E63C832E2BFDB63EB00443D4340'::geometry,'03','10',0.2, 1.0),
( 6,'0101000020E61000006F3742B7FB9018C0DD967DC4D95A4340'::geometry,'06','11',0.05, 1.0),
( 7,'0101000020E6100000E4BB36995F4C0740EAC0E5CA9FC94340'::geometry,'07','04',0.4, 1.0),
( 8,'0101000020E61000003D43CC6CAFBEFF3F6B52E66F91DD4440'::geometry,'08','09',0.7, 1.0),
( 9,'0101000020E61000003CC797BD99AF0CC0495A87FA312F4540'::geometry,'09','07',0.5, 1.0),
(13,'0101000020E61000001CAA00A9F19F0EC05DF9267B7A764340'::geometry,'13','08',0.4, 1.0),
(16,'0101000020E6100000D8208F3CBC9001C065638DC1B1F24340'::geometry,'16','08',0.4, 1.0),
(17,'0101000020E6100000E9E6A94A71630540AD7A0CB062104540'::geometry,'17','09',0.6, 1.0),
(18,'0101000020E6100000719792D59E240AC098AC548E00A84240'::geometry,'18','01',0.3, 1.0),
(19,'0101000020E6100000972C878B50FD04C0123C881D1F684440'::geometry,'19','08',0.7, 1.0),
(21,'0101000020E6100000F7893E9934511BC0EAA4BF03E1C94240'::geometry,'21','01',0.1, 1.0),
(22,'0101000020E6100000572C2123B2A8B2BF7ED7FABAFD194540'::geometry,'22','02',0.4, 1.0),
(25,'0101000020E6100000461B67D688C4F03FD990EEC3A0054540'::geometry,'25','09',0.4, 1.0),
(26,'0101000020E6100000A139FB06E82204C0539D84F62E234540'::geometry,'26','17',0.6, 1.0),
(27,'0101000020E6100000A92E54E618C91DC00D3A947B81814540'::geometry,'27','12',0.3, 1.0),
(28,'0101000020E6100000971DC8B682BC0DC016D0E8055F3F4440'::geometry,'28','13',0.8, 1.0),
(30,'0101000020E6100000A2DC1964A8C5F7BF19299C994D004340'::geometry,'30','14',0.1, 1.0),
(31,'0101000020E6100000DCA1FCC87B56FABF9B88E9D866554540'::geometry,'31','15',0.9, 1.0),
(32,'0101000020E6100000E1517AFCD15E1EC0A18D8D4825194540'::geometry,'32','12',0.3, 1.0),
(33,'0101000020E6100000A7FF33825AF917C0FABE7DFB6BA54540'::geometry,'33','03',0.4, 1.0),
(34,'0101000020E6100000FB4E4EBEB72412C0898E7240982F4540'::geometry,'34','07',0.3, 1.0),
(35,'0101000020E6100000224682B01B1A2DC011091656CC5C3C40'::geometry,'35','05',0.3, 1.0),
(36,'0101000020E6100000F7C9447110EC20C04C5D4823C7374540'::geometry,'36','12',0.2, 1.0),
(37,'0101000020E610000053D6A26DFB4218C09D58FAE209674440'::geometry,'37','07',0.5, 1.0),
(38,'0101000020E6100000B1D1B5FC910431C03C0C89BA03503C40'::geometry,'38','05',0.4, 1.0),
(39,'0101000020E610000086E6FEE1BD1E10C00417096748994540'::geometry,'39','06',0.6, 1.0),
(40,'0101000020E6100000FB51C33F733710C038D01729E4954440'::geometry,'40','07',0.5, 1.0),
(41,'0101000020E6100000912D6FDA28BB16C031321F08C4B74240'::geometry,'41','01',0.4, 1.0),
(42,'0101000020E6100000554432EABEB504C069ECD78775CF4440'::geometry,'42','07',0.2, 1.0),
(43,'0101000020E6100000157F117C1A2EEA3F027CD1F2368B4440'::geometry,'43','09',0.3, 1.0),
(44,'0101000020E610000051AA5B1BD718EABFEE67613BA4544440'::geometry,'44','02',0.2, 1.0),
(45,'0101000020E610000022C5C01BB69710C08563BC1499E54340'::geometry,'45','08',0.3, 1.0),
(46,'0101000020E6100000D5FCF78A11A0E9BFDEA46F8E64AF4340'::geometry,'46','10',0.2, 1.0),
(47,'0101000020E61000003AE63525866313C02100050B2BD14440'::geometry,'47','07',0.3, 1.0),
(48,'0101000020E610000030F187FD1FD206C0C767E1496C9E4540'::geometry,'48','16',0.5, 1.0),
(49,'0101000020E61000009C22867B12EC17C006C5F40C14DD4440'::geometry,'49','07',0.2, 1.0),
(50,'0101000020E6100000F7D5EFC62D08F1BF69D1231D68CF4440'::geometry,'50','02',0.6, 1.0),
(51,'0101000020E61000005B0E1F8DAA5F15C0530BFE285BF24140'::geometry,'51','18',0.01, 1.0),
(10,'0101000020E61000000FD65D82AEA418C06192D1351FDB4340'::geometry,'10','11',0.04, 1.0),
(11,'0101000020E6100000B305531DAB0A17C0DEAFCD4EE5464240'::geometry,'11','01',0.08, 1.0),
(12,'0101000020E610000059721A7297C9C2BF9EBE383BE51E4440'::geometry,'12','10',0.2, 1.0),
(14,'0101000020E610000000C86313AF3C13C0E530879C10FF4240'::geometry,'14','01',0.2, 1.0),
(15,'0101000020E61000002A475497B6ED20C06643D4131A904540'::geometry,'15','12',0.3, 1.0),
(20,'0101000020E6100000F975566FAD8D01C0E840C33F67924540'::geometry,'20','16',0.8, 1.0),
(23,'0101000020E610000025FA13E595880BC022BB07131D024340'::geometry,'23','01',0.1, 1.0),
(24,'0101000020E61000009C5F91C5095C17C0C78784B15A4F4540'::geometry,'24','07',0.3, 1.0),
(29,'0101000020E6100000C34D4A5B48E712C092E680892C684240'::geometry,'29','01',0.3, 1.0),
(52,'0101000020E6100000406A545EB29A07C04E5F0BDA39A54140'::geometry,'52','19',0.0, 1.01)
-- Areas of Interest functions perform some nondeterministic computations
-- (to estimate the significance); we will set the seeds for the RNGs
-- that affect those results to have repeateble results
SELECT cdb_crankshaft._cdb_random_seeds(1234);
_cdb_random_seeds
-------------------
(1 row)
SET client_min_messages TO WARNING;
\set ECHO none
_cdb_random_seeds
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.ids
ORDER BY ppoints.code;
code | quads
------+-------
01 | HH
02 | HL
03 | LL
04 | LL
05 | LH
06 | LL
07 | HH
08 | HH
09 | HH
10 | LL
11 | LL
12 | LL
13 | HL
14 | LL
15 | LL
16 | HH
17 | HH
18 | LL
19 | HH
20 | HH
21 | LL
22 | HH
23 | LL
24 | LL
25 | HH
26 | HH
27 | LL
28 | HH
29 | LL
30 | LL
31 | HH
32 | LL
33 | HL
34 | LH
35 | LL
36 | LL
37 | HL
38 | HL
39 | HH
40 | HH
41 | HL
42 | LH
43 | LH
44 | LL
45 | LH
46 | LL
47 | LL
48 | HH
49 | LH
50 | HH
51 | LL
52 | LL
(1 row)
code|quads
01|HH
02|HL
03|LL
04|LL
05|LH
06|LL
07|HH
08|HH
09|HH
10|LL
11|LL
12|LL
13|HL
14|LL
15|LL
16|HH
17|HH
18|LL
19|HH
20|HH
21|LL
22|HH
23|LL
24|LL
25|HH
26|HH
27|LL
28|HH
29|LL
30|LL
31|HH
32|LL
33|HL
34|LH
35|LL
36|LL
37|HL
38|HL
39|HH
40|HH
41|HL
42|LH
43|LH
44|LL
45|LH
46|LL
47|LL
48|HH
49|LH
50|HH
51|LL
52|LL
(52 rows)
_cdb_random_seeds
SELECT cdb_crankshaft._cdb_random_seeds(1234);
_cdb_random_seeds
-------------------
(1 row)
code|quads
01|HH
02|HL
07|HH
08|HH
09|HH
13|HL
16|HH
17|HH
19|HH
20|HH
22|HH
25|HH
26|HH
28|HH
31|HH
33|HL
37|HL
38|HL
39|HH
40|HH
41|HL
48|HH
50|HH
(23 rows)
_cdb_random_seeds
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.ids
ORDER BY ppoints2.code;
code | quads
------+-------
01 | LL
02 | LH
03 | HH
04 | HH
05 | LL
06 | HH
07 | LL
08 | LL
09 | LL
10 | HH
11 | HH
12 | HL
13 | LL
14 | HH
15 | LL
16 | LL
17 | LL
18 | LH
19 | LL
20 | LL
21 | HH
22 | LL
23 | HL
24 | LL
25 | LL
26 | LL
27 | LL
28 | LL
29 | LH
30 | HH
31 | LL
32 | LL
33 | LL
34 | LL
35 | LH
36 | HL
37 | LH
38 | LH
39 | LL
40 | LL
41 | LH
42 | HL
43 | LL
44 | HL
45 | LL
46 | HL
47 | LL
48 | LL
49 | HL
50 | LL
51 | HH
(1 row)
code|quads
03|LL
04|LL
05|LH
06|LL
10|LL
11|LL
12|LL
14|LL
15|LL
18|LL
21|LL
23|LL
24|LL
27|LL
29|LL
30|LL
32|LL
34|LH
35|LL
36|LL
42|LH
43|LH
44|LL
45|LH
46|LL
47|LL
49|LH
51|LL
52|LL
(29 rows)
_cdb_random_seeds
(1 row)
code|quads
02|HL
05|LH
13|HL
33|HL
34|LH
37|HL
38|HL
41|HL
42|LH
43|LH
45|LH
49|LH
(12 rows)
_cdb_random_seeds
(1 row)
code|quads
01|LL
02|LH
03|HH
04|HH
05|LL
06|HH
07|LL
08|LL
09|LL
10|HH
11|HH
12|HL
13|LL
14|HH
15|LL
16|LL
17|LL
18|LH
19|LL
20|LL
21|HH
22|LL
23|HL
24|LL
25|LL
26|LL
27|LL
28|LL
29|LH
30|HH
31|LL
32|LL
33|LL
34|LL
35|LH
36|HL
37|LH
38|LH
39|LL
40|LL
41|LH
42|HL
43|LL
44|HL
45|LL
46|HL
47|LL
48|LL
49|HL
50|LL
51|HH
(51 rows)
_cdb_random_seeds
(1 row)
code|quads
03|HH
04|HH
06|HH
10|HH
11|HH
12|HL
14|HH
21|HH
23|HL
30|HH
36|HL
42|HL
44|HL
46|HL
49|HL
51|HH
(16 rows)
_cdb_random_seeds
(1 row)
code|quads
01|LL
02|LH
05|LL
07|LL
08|LL
09|LL
13|LL
15|LL
16|LL
17|LL
18|LH
19|LL
20|LL
22|LL
24|LL
25|LL
26|LL
27|LL
28|LL
29|LH
31|LL
32|LL
33|LL
34|LL
35|LH
37|LH
38|LH
39|LL
40|LL
41|LH
43|LL
45|LL
47|LL
48|LL
50|LL
(35 rows)
_cdb_random_seeds
(1 row)
code|quads
02|LH
12|HL
18|LH
23|HL
29|LH
35|LH
36|HL
37|LH
38|LH
41|LH
42|HL
44|HL
46|HL
49|HL
(14 rows)

View File

@@ -1,21 +1,11 @@
\i test/fixtures/polyg_values.sql
CREATE TABLE values (cartodb_id integer, value float, the_geom geometry);
INSERT INTO values(cartodb_id, value, the_geom) VALUES
(1,10,'0106000020E61000000100000001030000000100000005000000E5AF3500C03608C08068629111374440C7BC0A00C00F02C0AC0551523B414440C7BC0A00C0A700C0CAF23B6E74FB4340A7267FFFFF5206C0FBB7E41B7EE74340E5AF3500C03608C08068629111374440'::geometry),
(2,20,'0106000020E610000001000000010300000001000000050000002439EC00804AF7BF07D6CCB5C3064440C7BC0A00C0A700C0CAF23B6E74FB4340C7BC0A00C00F02C0AC0551523B414440E20CD5FFFF30FABFBE4F76AFEA4B44402439EC00804AF7BF07D6CCB5C3064440'::geometry)
SELECT round(cdb_crankshaft.cdb_overlap_sum(
'0106000020E61000000100000001030000000100000004000000FFFFFFFFFF3604C09A0B9ECEC42E444000000000C060FBBF30C7FD70E01D44400000000040AD02C06481F1C8CD034440FFFFFFFFFF3604C09A0B9ECEC42E4440'::geometry,
'values', 'value'
), 2);
SET client_min_messages TO WARNING;
\set ECHO none
round
-------
4.42
(1 row)
SELECT round(cdb_crankshaft.cdb_overlap_sum(
'0106000020E61000000100000001030000000100000004000000FFFFFFFFFF3604C09A0B9ECEC42E444000000000C060FBBF30C7FD70E01D44400000000040AD02C06481F1C8CD034440FFFFFFFFFF3604C09A0B9ECEC42E4440'::geometry,
'values', 'value', schema_name := 'public'
), 2);
round
-------
4.42

View File

@@ -0,0 +1,10 @@
\pset format unaligned
\set ECHO all
SELECT count(DISTINCT cluster_no) as clusters from cdb_crankshaft.cdb_kmeans('select * from ppoints', 2);
clusters
2
(1 row)
SELECT count(*) clusters from (select cdb_crankshaft.CDB_WeightedMean(the_geom, value::NUMERIC), code from ppoints group by code) p;
clusters
52
(1 row)

View File

@@ -1,3 +1,5 @@
SET client_min_messages TO WARNING;
\set ECHO none
CREATE TABLE values (cartodb_id integer, value float, the_geom geometry);
INSERT INTO values(cartodb_id, value, the_geom) VALUES
(1,10,'0106000020E61000000100000001030000000100000005000000E5AF3500C03608C08068629111374440C7BC0A00C00F02C0AC0551523B414440C7BC0A00C0A700C0CAF23B6E74FB4340A7267FFFFF5206C0FBB7E41B7EE74340E5AF3500C03608C08068629111374440'::geometry),

View File

@@ -1,3 +1,5 @@
SET client_min_messages TO WARNING;
\set ECHO none
-- test table (spanish province centroids with some invented values)
CREATE TABLE ppoints (cartodb_id integer, the_geom geometry, the_geom_webmercator geometry, code text, region_code text, value float);
INSERT INTO ppoints VALUES

View File

@@ -1,3 +1,5 @@
SET client_min_messages TO WARNING;
\set ECHO none
-- test table (spanish province centroids with some invented values)
CREATE TABLE ppoints2 (cartodb_id integer, the_geom geometry, code text, region_code text, numerator float, denominator float);
INSERT INTO ppoints2 VALUES

View File

@@ -1,7 +1,6 @@
-- Install dependencies
CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis;
CREATE EXTENSION cartodb;
-- Install the extension
CREATE EXTENSION crankshaft VERSION 'dev';

View File

@@ -1,3 +1,5 @@
\pset format unaligned
\set ECHO all
\i test/fixtures/ppoints.sql
\i test/fixtures/ppoints2.sql
@@ -8,14 +10,70 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.ids
JOIN cdb_crankshaft.CDB_AreasOfInterestLocal('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.rowid
ORDER BY ppoints.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Hotspots
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.CDB_GetSpatialHotspots('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.rowid
ORDER BY ppoints.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Coldspots
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.CDB_GetSpatialColdspots('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.rowid
ORDER BY ppoints.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Outliers
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.CDB_GetSpatialOutliers('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.rowid
ORDER BY ppoints.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Areas of Interest (rate)
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.ids
JOIN cdb_crankshaft.CDB_AreasOfInterestLocalRate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.rowid
ORDER BY ppoints2.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Hotspots (rate)
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.CDB_GetSpatialHotspotsRate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.rowid
ORDER BY ppoints2.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Coldspots (rate)
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.CDB_GetSpatialColdspotsRate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.rowid
ORDER BY ppoints2.code;
SELECT cdb_crankshaft._cdb_random_seeds(1234);
-- Spatial Outliers (rate)
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.CDB_GetSpatialOutliersRate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.rowid
ORDER BY ppoints2.code;

View File

@@ -0,0 +1,6 @@
\pset format unaligned
\set ECHO all
SELECT count(DISTINCT cluster_no) as clusters from cdb_crankshaft.cdb_kmeans('select * from ppoints', 2);
SELECT count(*) clusters from (select cdb_crankshaft.CDB_WeightedMean(the_geom, value::NUMERIC), code from ppoints group by code) p;

View File

@@ -4,7 +4,7 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SET ROLE test_regular_user;
-- Add to the search path the schema
SET search_path TO public,cartodb,cdb_crankshaft;
SET search_path TO public,cdb_crankshaft;
-- Exercise public functions
SELECT ppoints.code, m.quads

View File

@@ -2,21 +2,16 @@ include ../../Makefile.global
# Install the package locally for development
install:
virtualenv --system-site-packages ../../envs/dev
# source ../../envs/dev/bin/activate
../../envs/dev/bin/pip install -I ./crankshaft
../../envs/dev/bin/pip install -I nose
pip install ./crankshaft
# Test develpment install
test:
../../envs/dev/bin/nosetests crankshaft/test/
nosetests crankshaft/test/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
mkdir -p ../../release/python/$(EXTVERSION)
cp -r ./$(PACKAGE) ../../release/python/$(EXTVERSION)/
$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g' ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
deploy:
virtualenv --system-site-packages $(VIRTUALENV_PATH)/$(RELEASE_VERSION)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I -U ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I nose
deploy:
pip install $(RUN_OPTIONS) --upgrade ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)

View File

@@ -10,7 +10,6 @@ nosetests test/
## Notes about Python dependencies
* This extension is targeted at production databases. Therefore certain restrictions must be assumed about the production environment vs other experimental environments.
* We're using `pip` and `virtualenv` to generate a suitable isolated environment for python code that has all the dependencies
* Every dependency should be:
- Added to the `setup.py` file
- Installed through it
@@ -30,21 +29,7 @@ PySAL 1.10 or later, so we'll stick to 1.9.1.
apt-get install -y python-scipy
```
We'll use virtual environments to install our packages,
but configued to use also system modules so that the
mentioned scipy and numpy are used.
# Create a virtual environment for python
$ virtualenv --system-site-packages dev
# Activate the virtualenv
$ source dev/bin/activate
# Install all the requirements
# expect this to take a while, as it will trigger a few compilations
(dev) $ pip install -I ./crankshaft
#### Test the libraries with that virtual env
#### Test the libraries
##### Test numpy library dependency:

View File

@@ -1 +1,2 @@
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -14,7 +14,7 @@ import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
permutations, geom_col, id_col, w_type, num_ngbrs):
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
@@ -56,7 +56,7 @@ def moran(subquery, attr_name,
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
permutations, geom_col, id_col, w_type, num_ngbrs):
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
@@ -96,7 +96,7 @@ def moran_local(subquery, attr,
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
permutations, geom_col, id_col, w_type, num_ngbrs):
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
@@ -137,7 +137,7 @@ def moran_rate(subquery, numerator, denominator,
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
permutations, geom_col, id_col, w_type, num_ngbrs):
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher

View File

@@ -11,7 +11,7 @@ def construct_neighbor_query(w_type, query_vals):
@param query_vals dict: values used to construct the query
"""
if w_type == 'knn':
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
@@ -22,7 +22,7 @@ def get_weight(query_res, w_type='knn', num_ngbrs=5):
Construct PySAL weight from return value of query
@param query_res: query results with attributes and neighbors
"""
if w_type == 'knn':
if w_type.lower() == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
else:

View File

@@ -40,9 +40,9 @@ setup(
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.9.1'],
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy' ],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -52,7 +52,7 @@ class MoranTest(unittest.TestCase):
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
@@ -64,7 +64,7 @@ class MoranTest(unittest.TestCase):
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 99, 'the_geom', 'cartodb_id', 'knn', 5)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
@@ -76,7 +76,7 @@ class MoranTest(unittest.TestCase):
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()