Compare commits

...

59 Commits

Author SHA1 Message Date
Carla Iriberri
f127d4a18e Update 0.3.1 version with voronoi fix 2016-08-18 17:38:24 +02:00
Carla Iriberri
9016e5f35e Merge branch 'develop' 2016-08-18 17:35:36 +02:00
Carla
66a0c80aee Merge pull request #114 from CartoDB/add-voronoi
[FIX] Voronoi spurious segments [09]
2016-08-18 17:30:01 +02:00
abelvm
1061c85405 doc example fixed 2016-08-18 11:16:12 -04:00
abelvm
6fc460f6db removed spikes 2016-08-18 10:16:26 -04:00
abelvm
ce86c4c5b4 removed spikes 2016-08-18 10:12:15 -04:00
abelvm
4352d3ded7 removed spurious segments 2016-08-18 09:36:15 -04:00
abelvm
cd0a802aff Merge branch 'develop' of github.com:CartoDB/crankshaft into add-voronoi 2016-08-18 09:31:14 -04:00
abelvm
6fe467366a removed spurious segments 2016-08-18 09:31:08 -04:00
Carla
2963afd711 Rename crankshaft--0.3.0-0.3.1.sql to crankshaft--0.3.0--0.3.1.sql 2016-08-18 10:43:43 +02:00
Carla
dd6e116da9 Rebranding 2016-08-18 10:35:54 +02:00
Carla Iriberri
aa745f67db Release 0.3.1 2016-08-18 10:34:52 +02:00
Carla
c668a82fcf Merge pull request #112 from CartoDB/add-voronoi
[Hot Fix] Voronoi projection issue [09]
2016-08-18 10:13:53 +02:00
Carla
cfacda5799 Merge pull request #106 from CartoDB/add-interpolation
[Add interpolation] Tests for the three methods [08]
2016-08-18 10:13:36 +02:00
abelvm
cbc505a62b fix test out 2016-08-17 12:51:34 -04:00
abelvm
e9ce87fb99 fix test out 2016-08-17 12:43:44 -04:00
abelvm
d67f3951a7 fix test 2016-08-17 12:32:17 -04:00
abelvm
3ddcd4ed2d Merge branch 'develop' of github.com:CartoDB/crankshaft into add-voronoi 2016-08-17 12:22:30 -04:00
abelvm
94497cd791 fix projection issues 2016-08-17 12:22:22 -04:00
Rafa de la Torre
cf38cb43c0 Update RELEASE.md
Add a couple of additional steps
2016-08-17 10:58:12 +02:00
Carla
c16a364254 Fixes markdown style 2016-08-17 10:41:13 +02:00
Carla
990e714685 Fixes markdown style 2016-08-17 10:39:53 +02:00
Rafa de la Torre
1889c52578 Add generated files for v0.3.0 2016-08-17 10:36:50 +02:00
Rafa de la Torre
fba043e832 Update NEWS and default version for v0.3.0 2016-08-17 10:35:01 +02:00
Rafa de la Torre
5dcbbe0429 Merge pull request #81 from CartoDB/add-voronoi
Add voronoi [09]
2016-08-17 10:28:02 +02:00
Rafa de la Torre
c181bfa72e Small fix for type checking 2016-08-17 10:21:07 +02:00
abelvm
e2f8c67034 test->avg area 2016-08-16 12:54:15 -04:00
abelvm
756db707b5 fix results 2016-08-16 11:08:17 -04:00
abelvm
5b8f7c4cd5 add schema 2016-08-16 11:01:25 -04:00
abelvm
51b8da5bf8 add full tests 2016-08-16 10:44:13 -04:00
abelvm
630965605c add full tests 2016-08-16 10:41:54 -04:00
abelvm
14e5ca4b22 add full tests 2016-08-16 10:40:32 -04:00
Rafa de la Torre
ebbf68af57 Merge pull request #101 from CartoDB/fix-interpolation
HOTFIX barycenter method
2016-08-16 10:09:17 +02:00
abelvm
7914fcdb87 fix barycenter method 2016-08-15 16:32:25 -04:00
abelvm
317f7f2fa6 change results 2016-08-12 14:10:57 -04:00
abelvm
e95374646f change force2d signature 2016-08-12 13:54:54 -04:00
abelvm
5c207d9ec0 test 2016-08-12 13:48:55 -04:00
abelvm
8430b98931 Merge branch 'develop' of github.com:CartoDB/crankshaft into add-voronoi 2016-08-12 13:47:56 -04:00
abelvm
644542e5df fixed internals 2016-08-12 13:34:29 -04:00
Rafa de la Torre
7f6766b6c1 Merge pull request #99 from CartoDB/update-process-docs
Update process docs
2016-08-12 19:14:59 +02:00
Rafa de la Torre
8953bf92ee Update release process 2016-08-12 18:56:03 +02:00
Rafa de la Torre
065dc476b4 Revamp the dev process 2016-08-12 18:34:13 +02:00
Rafa de la Torre
170260b7f5 Revamp the dev process 2016-08-12 18:17:13 +02:00
Rafa de la Torre
a6b5f74d67 Merge pull request #95 from CartoDB/master
merging release 0.2.0 back to development
2016-08-12 17:00:07 +02:00
Rafa de la Torre
47f1f918d4 Do not check code is updated when not needed
Particularly, when it is master or a detached HEAD (usually when
checking out tags).
2016-08-11 19:20:04 +02:00
Rafa de la Torre
9ada8d12ca Add upgrade file from 0.1.0 to 0.2.0
As compatibility was checked with CI tests, this is simply a copy:
```
cp release/crankshaft--0.2.0.sql release/crankshaft--0.1.0--0.2.0.sql
```

To be automated in `make release` command.
2016-08-11 18:52:30 +02:00
Rafa de la Torre
1bad568fbf Add files generated with make release v0.2.0 2016-08-11 18:49:55 +02:00
Rafa de la Torre
75b34d29ce Update default_version and NEWS in prep of v0.2.0 2016-08-11 18:47:41 +02:00
abelvm
749953931c tests spaces 2016-08-10 13:54:11 -04:00
abelvm
8ea1f2edea tests IN 2016-08-10 13:33:45 -04:00
abelvm
3284ed19a5 tests 2016-08-10 13:32:55 -04:00
abelvm
0b2caa48e9 fixed tests 2016-08-10 13:22:10 -04:00
abelvm
75a0e54849 fixed tests 2016-08-10 12:23:32 -04:00
abelvm
613cdb4c6f fixed tests 2016-08-10 12:15:50 -04:00
abelvm
b5142002e9 Merge branch 'add-voronoi' of github.com:CartoDB/crankshaft into add-voronoi 2016-08-09 13:01:22 -04:00
abelvm
8c273a53bc check tests 2016-08-09 13:01:05 -04:00
Rafa de la Torre
c7788b7b2c Merge remote-tracking branch 'origin/develop' into add-voronoi 2016-08-09 16:21:04 +02:00
abelvm
e84dc30b77 format 2016-06-30 16:11:28 +02:00
abelvm
55f8626901 add files 2016-06-30 16:09:13 +02:00
93 changed files with 11549 additions and 190 deletions

View File

@@ -1,10 +1,7 @@
# Development process
Please read the Working Process/Quickstart Guide in [README.md](https://github.com/CartoDB/crankshaft/blob/master/README.md) first.
For any modification of crankshaft, such as adding new features,
refactoring or bug-fixing, topic branch must be created out of the `develop`
branch and be used for the development process.
refactoring or bugfixing, a topic branch must be created out of the `develop`.
Modifications are done inside `src/pg/sql` and `src/py/crankshaft`.
@@ -14,80 +11,47 @@ Take into account:
(inside `src/pg/test`, `src/py/crankshaft/test`) as well as to
detect any bugs that are being fixed.
* Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
* Convention: snake case(i.e. `snake_case` and not `CamelCase`)
shall be used for all function names.
Prefix function names intended for public use with `cdb_`
and private functions (to be used only internally inside
the extension) with `_cdb_`.
* Naming conventions for function names:
- use `CamelCase`
- prefix "public" functions with `CDB_`. E.g: `CDB_SpatialMarkovTrend`
- prefix "private" functions with an underscore. E.g: `_CDB_MyObscureInternalImplementationDetail`
Once the code is ready to be tested, update the local development installation
with `sudo make install`.
This will update the 'dev' version of the extension in `src/pg/` and
make it available to PostgreSQL.
It will also install the python package (crankshaft) in a virtual
environment `env/dev`.
The version number of the Python package, defined in
`src/pg/crankshaft/setup.py` will be overridden when
the package is released and always match the extension version number,
but for development it shall be kept as '0.0.0'.
Run the tests with `make test`.
To use the python extension for custom tests, activate the virtual
environment with:
```
source envs/dev/bin/activate
```
Update extension in a working database with:
* `ALTER EXTENSION crankshaft UPDATE TO 'current';`
`ALTER EXTENSION crankshaft UPDATE TO 'dev';`
Note: we keep the current development version install as 'dev' always;
we update through the 'current' alias to allow changing the extension
contents but not the version identifier. This will fail if the
changes involve incompatible function changes such as a different
return type; in that case the offending function (or the whole extension)
should be dropped manually before the update.
```sql
ALTER EXTENSION crankshaft UPDATE TO 'current';
ALTER EXTENSION crankshaft UPDATE TO 'dev';
```
If the extension has not previously been installed in a database,
it can be installed directly with:
* `CREATE EXTENSION IF NOT EXISTS plpythonu;`
`CREATE EXTENSION IF NOT EXISTS postgis;`
`CREATE EXTENSION crankshaft WITH VERSION 'dev';`
Note: the development extension uses the development python virtual
environment automatically.
Before proceeding to the release process peer code reviewing of the code is
a must.
```sql
CREATE EXTENSION IF NOT EXISTS plpythonu;
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION crankshaft WITH VERSION 'dev';
```
Once the feature or bugfix is completed and all the tests are passing
a Pull-Request shall be created on the topic branch, reviewed by a peer
and then merged back into the `develop` branch when all CI tests pass.
a pull request shall be created, reviewed by a peer
and then merged back into the `develop` branch once all the CI tests pass.
When the changes in the `develop` branch are to be released in a new
version of the extension, a PR must be created on the `develop` branch.
The release manage will take hold of the PR at this moment to proceed
to the release process for a new revision of the extension.
## Relevant development targets in the Makefile
## Relevant development tasks available in the Makefile
```shell
# Show a short description of the available targets
make help
```
* `make help` show a short description of the available targets
* `sudo make install` will generate the extension scripts for the development
version ('dev'/'current') and install the python package into the
development virtual environment `envs/dev`.
Intended for use by developers.
* `make test` will run the tests for the installed development extension.
Intended for use by developers.
# Generate the extension scripts and install the python package.
sudo make install
# Run the tests against the installed extension.
make test
```

View File

@@ -1,3 +1,12 @@
0.3.0 (2016-08-17)
------------------
* Adds Voronoi function
* Fixes barycenter method in interpolation
0.2.0 (2016-08-11)
------------------
* Adds Gravity Model
0.1.0 (2016-06-29)
------------------
* Adds Spatial Markov function

View File

@@ -1,70 +1,67 @@
# crankshaft [![Build Status](https://travis-ci.org/CartoDB/crankshaft.svg?branch=develop)](https://travis-ci.org/CartoDB/crankshaft)
# Crankshaft [![Build Status](https://travis-ci.org/CartoDB/crankshaft.svg?branch=develop)](https://travis-ci.org/CartoDB/crankshaft)
CartoDB Spatial Analysis extension for PostgreSQL.
CARTO Spatial Analysis extension for PostgreSQL.
## Code organization
* *doc* documentation
* *src* source code
* - *src/pg* contains the PostgreSQL extension source code
* - *src/py* Python module source code
* *release* reseleased versions
* `doc/` documentation
* `src/` source code
- `pg/` contains the PostgreSQL extension source code
- `py/` Python module source code
* `release` reseleased versions
## Requirements
* pip, PostgreSQL
* python-scipy system package (see [src/py/README.md](https://github.com/CartoDB/crankshaft/blob/master/src/py/README.md))
* PostgreSQL
* plpythonu and postgis extensions
* python-scipy system package (see [src/py/README.md](https://github.com/CartoDB/crankshaft/blob/develop/src/py/README.md))
# Working Process -- Quickstart Guide
# Development Process
We distinguish two roles regarding the development cycle of crankshaft:
We distinguish two roles:
* *developers* will implement new functionality and bugfixes into
the codebase and will request for new releases of the extension.
* A *release manager* will attend these requests and will handle
the release process. The release process is sequential:
no concurrent releases will ever be in the works.
the codebase.
* A *release manager* will handle the release process.
We use the default `develop` branch as the basis for development.
The `master` branch is used to merge and tag releases to be
deployed in production.
We use the branch `develop` as the main integration branch for development. The `master` is reserved to handle releases.
Developers shall create a new topic branch from `develop` for any new feature
or bugfix and commit their changes to it and eventually merge back into
the `develop` branch. When a new release is required a Pull Request
will be open against the `develop` branch.
The process is as follows:
1. Create a new **topic branch** from `develop` for any new feature
or bugfix and commit their changes to it:
```shell
git fetch && git checkout -b my-cool-feature origin/develop
```
1. Code, commit, push, repeat.
1. Write some **tests** for your feature or bugfix.
1. Update the [NEWS.md](https://github.com/CartoDB/crankshaft/blob/develop/NEWS.md) doc.
1. Create a pull request and mention relevant people for a **peer review**.
1. Address the comments and improvements you get from the peer review.
1. Mention `@CartoDB/dataservices` in the PR to get it merged into `develop`.
In order for a pull request to be accepted, the following criteria should be met:
* The peer review should pass and no major issue should be left unaddressed.
* CI tests must pass (travis will take care of that).
The `develop` pull requests will be handled by the release manage,
who will merge into master where new releases are prepared and tagged.
The `master` branch is the sole responsibility of the release masters
and developers must not commit or merge into it.
## Development Guidelines
For a detailed description of the development process please see
the [CONTRIBUTING.md](https://github.com/CartoDB/crankshaft/blob/master/CONTRIBUTING.md) guide.
the [CONTRIBUTING.md](https://github.com/CartoDB/crankshaft/blob/develop/CONTRIBUTING.md) guide.
Any modification to the source code (`src/pg/sql` for the SQL extension,
`src/py/crankshaft` for the Python package) shall always be done
in a topic branch created from the `develop` branch.
Tests, documentation and peer code reviewing are required for all
modifications.
## Testing
The tests (both for SQL and Python) are executed by running,
from the top directory:
The tests (both for SQL and Python) are executed by running, from the top directory:
```
```shell
sudo make install
make test
```
To request a new release, which will be handled by them
release manager, a Pull Request must be created in the `develop`
branch.
## Release
The release and deployment process is described in the
[RELEASE.md](https://github.com/CartoDB/crankshaft/blob/master/RELEASE.md) guide and it is the responsibility of the designated
release manager.
The release process is described in the
[RELEASE.md](https://github.com/CartoDB/crankshaft/blob/develop/RELEASE.md) guide and is the responsibility of the designated *release manager*.

View File

@@ -1,93 +1,53 @@
# Release & Deployment Process
Please read the Working Process/Quickstart Guide in README.md
and the Development guidelines in CONTRIBUTING.md.
The release process of a new version of the extension
shall be performed by the designated *Release Manager*.
Note that we expect to gradually automate more of this process.
## Release steps
1. Make sure `develop` branch passes all the tests.
1. Merge `develop` into `master`
1. Update the version number in `src/pg/crankshaft.control`.
1. Generate the next release files with this command:
Having checked PR to be released it shall be
merged back into the `master` branch to prepare the new release.
```shell
make release
```
1. Generate an upgrade path from the previous to the next release by copying the generated release file. E.g:
The version number in `pg/cranckshaft.control` must first be updated.
To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
```shell
cp release/cranckshaft--X.Y.Z.sql release/cranckshaft--A.B.C--X.Y.Z.sql
```
NOTE: you can rely on this thanks to the compatibility checks.
TODO: automate this step [#94](https://github.com/CartoDB/crankshaft/issues/94)
1. Commit and push the generated files.
1. Tag the release:
Thew `NEWS.md` will be updated.
```
git tag -a X.Y.Z -m "Release X.Y.Z"
git push origin X.Y.Z
```
1. Deploy and test in staging
1. Deploy and test in production
1. Merge back into develop
We now will explain the process for the case of backwards-compatible
releases (updating the minor or patch version numbers).
TODO: document the complex case of major releases.
## Some remarks
* Version numbers shall follow [Semantic Versioning 2.0](http://semver.org/).
* CI tests will take care of **forward compatibility** of the extension at postgres level.
* **Major version changes** (breaking forward compatibility) are a major event and are out of the scope of this doc. They **shall be avoided as much as we can**.
* We will go forward, never backwards. **Generating upgrade paths automatically is easy** and we'll rely on the CI checks for that.
The next command must be executed to produce the main installation
script for the new release, `release/cranckshaft--X.Y.Z.sql` and
also to copy the python package to `release/python/X.Y.Z/crankshaft`.
```
make release
```
Then, the release manager shall produce upgrade and downgrade scripts
to migrate to/from the previous release. In the case of minor/patch
releases this simply consist in extracting the functions that have changed
and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
file.
## Deploy commands
The new release can be deployed for staging/smoke tests with this command:
```
sudo make deploy
```
```shell
sudo make deploy
```
This will copy the current 'X.Y.Z' released version of the extension to
PostgreSQL. The corresponding Python extension will be installed in a
virtual environment in `envs/X.Y.Z`.
To install a specific version 'X.Y.Z' different from the default one:
It can be activated with:
```
source envs/X.Y.Z/bin/activate
```
But note that this is needed only for using the package directly;
the 'X.Y.Z' version of the extension will automatically use the
python package from this virtual environment.
The `sudo make deploy` operation can be also used for installing
the new version after it has been released.
To install a specific version 'X.Y.Z' different from the current one
(which must be present in `releases/`) you can:
```
sudo make deploy RELEASE_VERSION=X.Y.Z
```
TODO: testing procedure for the new release.
TODO: procedure for staging deployment.
TODO: procedure for merging to master, tagging and deploying
in production.
## Relevant release & deployment tasks available in the Makefile
```
* `make help` show a short description of the available targets
* `make release` will generate a new release (version number defined in
`src/pg/crankshaft.control`) into `release/`.
Intended for use by the release manager.
* `sudo make deploy` will install the current release X.Y.Z from the
`release/` files into PostgreSQL and a Python virtual environment
`envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
* `sudo make deploy RELEASE_VERSION=X.Y.Z` will install specified version
previously generated in `release/`
into PostgreSQL and a Python virtual environment `envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
```
```shell
sudo make deploy RELEASE_VERSION=X.Y.Z
```

View File

@@ -1,5 +1,13 @@
#!/bin/bash
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
if [[ "$CURRENT_BRANCH" == "master" || "$CURRENT_BRANCH" == "HEAD" ]]
then
echo "master branch or detached HEAD"
exit 0
fi
# Add remote-master
git remote add -t master remote-master https://github.com/CartoDB/crankshaft.git

38
doc/09_voronoi.md Normal file
View File

@@ -0,0 +1,38 @@
## Voronoi
Function to construct the [Voronoi Diagram](https://en.wikipedia.org/wiki/Voronoi_diagram) from a dataset of scatter points, clipped to the significant area
PostGIS wil include this in future versions ([doc for dev branch](http://postgis.net/docs/manual-dev/ST_Voronoi.html)) and will perform faster for sure, but in the meantime...
### CDB_Voronoi (geom geometry[], buffer numeric DEFAULT 0.5, tolerance numeric DEFAULT 1e-9)
#### Arguments
| Name | Type | Description |
|------|------|-------------|
| geom | geometry[] | Array of points's geometries |
| buffer | numeric | enlargment ratio for the envelope area used for the restraints|
| tolerance | numeric | Delaunay tolerance, optional |
### Returns
| Column Name | Type | Description |
|-------------|------|-------------|
| geom | geometry collection | Collection of polygons of the Voronoi cells|
#### Example Usage
```sql
WITH a AS (
SELECT
ARRAY[ST_GeomFromText('POINT(2.1744 41.403)', 4326),ST_GeomFromText('POINT(2.1228 41.380)', 4326),ST_GeomFromText('POINT(2.1511 41.374)', 4326),ST_GeomFromText('POINT(2.1528 41.413)', 4326),ST_GeomFromText('POINT(2.165 41.391)', 4326),ST_GeomFromText('POINT(2.1498 41.371)', 4326),ST_GeomFromText('POINT(2.1533 41.368)', 4326),ST_GeomFromText('POINT(2.131386 41.41399)', 4326)] AS geomin
)
SELECT
st_transform(
(st_dump(CDB_voronoi(geomin, 0.2, 1e-9)
)).geom
, 3857) as the_geom_webmercator
FROM a;
```

View File

@@ -0,0 +1,827 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.2.0'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
CDB_PyAggS(current_state Numeric[], current_row Numeric[])
returns NUMERIC[] as $$
BEGIN
if array_upper(current_state,1) is null then
RAISE NOTICE 'setting state %',array_upper(current_row,1);
current_state[1] = array_upper(current_row,1);
end if;
return array_cat(current_state,current_row) ;
END
$$ LANGUAGE plpgsql;
-- Create aggregate if it did not exist
DO $$
BEGIN
IF NOT EXISTS (
SELECT *
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'cdb_crankshaft'
AND p.proname = 'cdb_pyagg'
AND p.proisagg)
THEN
CREATE AGGREGATE CDB_PyAgg(NUMERIC[]) (
SFUNC = CDB_PyAggS,
STYPE = Numeric[],
INITCOND = "{}"
);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION
CDB_CreateAndPredictSegment(
target NUMERIC[],
features NUMERIC[],
target_features NUMERIC[],
target_ids NUMERIC[],
n_estimators INTEGER DEFAULT 1200,
max_depth INTEGER DEFAULT 3,
subsample DOUBLE PRECISION DEFAULT 0.5,
learning_rate DOUBLE PRECISION DEFAULT 0.01,
min_samples_leaf INTEGER DEFAULT 1)
RETURNS TABLE(cartodb_id NUMERIC, prediction NUMERIC, accuracy NUMERIC)
AS $$
import numpy as np
import plpy
from crankshaft.segmentation import create_and_predict_segment_agg
model_params = {'n_estimators': n_estimators,
'max_depth': max_depth,
'subsample': subsample,
'learning_rate': learning_rate,
'min_samples_leaf': min_samples_leaf}
def unpack2D(data):
dimension = data.pop(0)
a = np.array(data, dtype=float)
return a.reshape(len(a)/dimension, dimension)
return create_and_predict_segment_agg(np.array(target, dtype=float),
unpack2D(features),
unpack2D(target_features),
target_ids,
model_params)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
CDB_CreateAndPredictSegment (
query TEXT,
variable_name TEXT,
target_table TEXT,
n_estimators INTEGER DEFAULT 1200,
max_depth INTEGER DEFAULT 3,
subsample DOUBLE PRECISION DEFAULT 0.5,
learning_rate DOUBLE PRECISION DEFAULT 0.01,
min_samples_leaf INTEGER DEFAULT 1)
RETURNS TABLE (cartodb_id TEXT, prediction NUMERIC, accuracy NUMERIC)
AS $$
from crankshaft.segmentation import create_and_predict_segment
model_params = {'n_estimators': n_estimators, 'max_depth':max_depth, 'subsample' : subsample, 'learning_rate': learning_rate, 'min_samples_leaf' : min_samples_leaf}
return create_and_predict_segment(query,variable_name,target_table, model_params)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION CDB_Gravity(
IN target_query text,
IN weight_column text,
IN source_query text,
IN pop_column text,
IN target bigint,
IN radius integer,
IN minval numeric DEFAULT -10e307
)
RETURNS TABLE(
the_geom geometry,
source_id bigint,
target_id bigint,
dist numeric,
h numeric,
hpop numeric) AS $$
DECLARE
t_id bigint[];
t_geom geometry[];
t_weight numeric[];
s_id bigint[];
s_geom geometry[];
s_pop numeric[];
BEGIN
EXECUTE 'WITH foo as('+target_query+') SELECT array_agg(cartodb_id), array_agg(the_geom), array_agg(' || weight_column || ') FROM foo' INTO t_id, t_geom, t_weight;
EXECUTE 'WITH foo as('+source_query+') SELECT array_agg(cartodb_id), array_agg(the_geom), array_agg(' || pop_column || ') FROM foo' INTO s_id, s_geom, s_pop;
RETURN QUERY
SELECT g.* FROM t, s, CDB_Gravity(t_id, t_geom, t_weight, s_id, s_geom, s_pop, target, radius, minval) g;
END;
$$ language plpgsql;
CREATE OR REPLACE FUNCTION CDB_Gravity(
IN t_id bigint[],
IN t_geom geometry[],
IN t_weight numeric[],
IN s_id bigint[],
IN s_geom geometry[],
IN s_pop numeric[],
IN target bigint,
IN radius integer,
IN minval numeric DEFAULT -10e307
)
RETURNS TABLE(
the_geom geometry,
source_id bigint,
target_id bigint,
dist numeric,
h numeric,
hpop numeric) AS $$
DECLARE
t_type text;
s_type text;
t_center geometry[];
s_center geometry[];
BEGIN
t_type := GeometryType(t_geom[1]);
s_type := GeometryType(s_geom[1]);
IF t_type = 'POINT' THEN
t_center := t_geom;
ELSE
WITH tmp as (SELECT unnest(t_geom) as g) SELECT array_agg(ST_Centroid(g)) INTO t_center FROM tmp;
END IF;
IF s_type = 'POINT' THEN
s_center := s_geom;
ELSE
WITH tmp as (SELECT unnest(s_geom) as g) SELECT array_agg(ST_Centroid(g)) INTO s_center FROM tmp;
END IF;
RETURN QUERY
with target0 as(
SELECT unnest(t_center) as tc, unnest(t_weight) as tw, unnest(t_id) as td
),
source0 as(
SELECT unnest(s_center) as sc, unnest(s_id) as sd, unnest (s_geom) as sg, unnest(s_pop) as sp
),
prev0 as(
SELECT
source0.sg,
source0.sd as sourc_id,
coalesce(source0.sp,0) as sp,
target.td as targ_id,
coalesce(target.tw,0) as tw,
GREATEST(1.0,ST_Distance(geography(target.tc), geography(source0.sc)))::numeric as distance
FROM source0
CROSS JOIN LATERAL
(
SELECT
*
FROM target0
WHERE tw > minval
AND ST_DWithin(geography(source0.sc), geography(tc), radius)
) AS target
),
deno as(
SELECT
sourc_id,
sum(tw/distance) as h_deno
FROM
prev0
GROUP BY sourc_id
)
SELECT
p.sg as the_geom,
p.sourc_id as source_id,
p.targ_id as target_id,
case when p.distance > 1 then p.distance else 0.0 end as dist,
100*(p.tw/p.distance)/d.h_deno as h,
p.sp*(p.tw/p.distance)/d.h_deno as hpop
FROM
prev0 p,
deno d
WHERE
p.targ_id = target AND
p.sourc_id = d.sourc_id;
END;
$$ language plpgsql;
-- 0: nearest neighbor
-- 1: barymetric
-- 2: IDW
CREATE OR REPLACE FUNCTION CDB_SpatialInterpolation(
IN query text,
IN point geometry,
IN method integer DEFAULT 1,
IN p1 numeric DEFAULT 0,
IN p2 numeric DEFAULT 0
)
RETURNS numeric AS
$$
DECLARE
gs geometry[];
vs numeric[];
output numeric;
BEGIN
EXECUTE 'WITH a AS('||query||') SELECT array_agg(the_geom), array_agg(attrib) FROM a' INTO gs, vs;
SELECT CDB_SpatialInterpolation(gs, vs, point, method, p1,p2) INTO output FROM a;
RETURN output;
END;
$$
language plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION CDB_SpatialInterpolation(
IN geomin geometry[],
IN colin numeric[],
IN point geometry,
IN method integer DEFAULT 1,
IN p1 numeric DEFAULT 0,
IN p2 numeric DEFAULT 0
)
RETURNS numeric AS
$$
DECLARE
gs geometry[];
vs numeric[];
gs2 geometry[];
vs2 numeric[];
g geometry;
vertex geometry[];
sg numeric;
sa numeric;
sb numeric;
sc numeric;
va numeric;
vb numeric;
vc numeric;
output numeric;
BEGIN
output := -999.999;
-- nearest
IF method = 0 THEN
WITH a as (SELECT unnest(geomin) as g, unnest(colin) as v)
SELECT a.v INTO output FROM a ORDER BY point<->a.g LIMIT 1;
RETURN output;
-- barymetric
ELSIF method = 1 THEN
WITH a as (SELECT unnest(geomin) AS e),
b as (SELECT ST_DelaunayTriangles(ST_Collect(a.e),0.001, 0) AS t FROM a),
c as (SELECT (ST_Dump(t)).geom as v FROM b),
d as (SELECT v FROM c WHERE ST_Within(point, v))
SELECT v INTO g FROM d;
IF g is null THEN
-- out of the realm of the input data
RETURN -888.888;
END IF;
-- vertex of the selected cell
WITH a AS (SELECT (ST_DumpPoints(g)).geom AS v)
SELECT array_agg(v) INTO vertex FROM a;
-- retrieve the value of each vertex
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
SELECT ST_area(g), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[2], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[1], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point,vertex[1],vertex[2], point]))) INTO sg, sa, sb, sc;
output := (coalesce(sa,0) * coalesce(va,0) + coalesce(sb,0) * coalesce(vb,0) + coalesce(sc,0) * coalesce(vc,0)) / coalesce(sg);
RETURN output;
-- IDW
-- p1: limit the number of neighbors, 0->no limit
-- p2: order of distance decay, 0-> order 1
ELSIF method = 2 THEN
IF p2 = 0 THEN
p2 := 1;
END IF;
WITH a as (SELECT unnest(geomin) as g, unnest(colin) as v),
b as (SELECT a.g, a.v FROM a ORDER BY point<->a.g)
SELECT array_agg(b.g), array_agg(b.v) INTO gs, vs FROM b;
IF p1::integer>0 THEN
gs2:=gs;
vs2:=vs;
FOR i IN 1..p1
LOOP
gs2 := gs2 || gs[i];
vs2 := vs2 || vs[i];
END LOOP;
ELSE
gs2:=gs;
vs2:=vs;
END IF;
WITH a as (SELECT unnest(gs2) as g, unnest(vs2) as v),
b as (
SELECT
(1/ST_distance(point, a.g)^p2::integer) as k,
(a.v/ST_distance(point, a.g)^p2::integer) as f
FROM a
)
SELECT sum(b.f)/sum(b.k) INTO output FROM b;
RETURN output;
END IF;
RETURN -777.777;
END;
$$
language plpgsql IMMUTABLE;
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
-- Create aggregate if it did not exist
DO $$
BEGIN
IF NOT EXISTS (
SELECT *
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'cdb_crankshaft'
AND p.proname = 'cdb_weightedmean'
AND p.proisagg)
THEN
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC) (
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);
END IF;
END
$$ LANGUAGE plpgsql;
-- Spatial Markov
-- input table format:
-- id | geom | date_1 | date_2 | date_3
-- 1 | Pt1 | 12.3 | 13.1 | 14.2
-- 2 | Pt2 | 11.0 | 13.2 | 12.5
-- ...
-- Sample Function call:
-- SELECT CDB_SpatialMarkov('SELECT * FROM real_estate',
-- Array['date_1', 'date_2', 'date_3'])
CREATE OR REPLACE FUNCTION
CDB_SpatialMarkovTrend (
subquery TEXT,
time_cols TEXT[],
num_classes INT DEFAULT 7,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (trend NUMERIC, trend_up NUMERIC, trend_down NUMERIC, volatility NUMERIC, rowid INT)
AS $$
from crankshaft.space_time_dynamics import spatial_markov_trend
## TODO: use named parameters or a dictionary
return spatial_markov_trend(subquery, time_cols, num_classes, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- input table format: identical to above but in a predictable format
-- Sample function call:
-- SELECT cdb_spatial_markov('SELECT * FROM real_estate',
-- 'date_1')
-- CREATE OR REPLACE FUNCTION
-- cdb_spatial_markov (
-- subquery TEXT,
-- time_col_min text,
-- time_col_max text,
-- date_format text, -- '_YYYY_MM_DD'
-- num_time_per_bin INT DEFAULT 1,
-- permutations INT DEFAULT 99,
-- geom_column TEXT DEFAULT 'the_geom',
-- id_col TEXT DEFAULT 'cartodb_id',
-- w_type TEXT DEFAULT 'knn',
-- num_ngbrs int DEFAULT 5)
-- RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
-- AS $$
-- plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
-- from crankshaft.clustering import moran_local
-- # TODO: use named parameters or a dictionary
-- return spatial_markov(subquery, time_cols, permutations, geom_column, id_col, w_type, num_ngbrs)
-- $$ LANGUAGE plpythonu;
--
-- -- input table format:
-- -- id | geom | date | measurement
-- -- 1 | Pt1 | 12/3 | 13.2
-- -- 2 | Pt2 | 11/5 | 11.3
-- -- 3 | Pt1 | 11/13 | 12.9
-- -- 4 | Pt3 | 12/19 | 10.1
-- -- ...
--
-- CREATE OR REPLACE FUNCTION
-- cdb_spatial_markov (
-- subquery TEXT,
-- time_col text,
-- num_time_per_bin INT DEFAULT 1,
-- permutations INT DEFAULT 99,
-- geom_column TEXT DEFAULT 'the_geom',
-- id_col TEXT DEFAULT 'cartodb_id',
-- w_type TEXT DEFAULT 'knn',
-- num_ngbrs int DEFAULT 5)
-- RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
-- AS $$
-- plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
-- from crankshaft.clustering import moran_local
-- # TODO: use named parameters or a dictionary
-- return spatial_markov(subquery, time_cols, permutations, geom_column, id_col, w_type, num_ngbrs)
-- $$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,827 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.2.0'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
CDB_PyAggS(current_state Numeric[], current_row Numeric[])
returns NUMERIC[] as $$
BEGIN
if array_upper(current_state,1) is null then
RAISE NOTICE 'setting state %',array_upper(current_row,1);
current_state[1] = array_upper(current_row,1);
end if;
return array_cat(current_state,current_row) ;
END
$$ LANGUAGE plpgsql;
-- Create aggregate if it did not exist
DO $$
BEGIN
IF NOT EXISTS (
SELECT *
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'cdb_crankshaft'
AND p.proname = 'cdb_pyagg'
AND p.proisagg)
THEN
CREATE AGGREGATE CDB_PyAgg(NUMERIC[]) (
SFUNC = CDB_PyAggS,
STYPE = Numeric[],
INITCOND = "{}"
);
END IF;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION
CDB_CreateAndPredictSegment(
target NUMERIC[],
features NUMERIC[],
target_features NUMERIC[],
target_ids NUMERIC[],
n_estimators INTEGER DEFAULT 1200,
max_depth INTEGER DEFAULT 3,
subsample DOUBLE PRECISION DEFAULT 0.5,
learning_rate DOUBLE PRECISION DEFAULT 0.01,
min_samples_leaf INTEGER DEFAULT 1)
RETURNS TABLE(cartodb_id NUMERIC, prediction NUMERIC, accuracy NUMERIC)
AS $$
import numpy as np
import plpy
from crankshaft.segmentation import create_and_predict_segment_agg
model_params = {'n_estimators': n_estimators,
'max_depth': max_depth,
'subsample': subsample,
'learning_rate': learning_rate,
'min_samples_leaf': min_samples_leaf}
def unpack2D(data):
dimension = data.pop(0)
a = np.array(data, dtype=float)
return a.reshape(len(a)/dimension, dimension)
return create_and_predict_segment_agg(np.array(target, dtype=float),
unpack2D(features),
unpack2D(target_features),
target_ids,
model_params)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
CDB_CreateAndPredictSegment (
query TEXT,
variable_name TEXT,
target_table TEXT,
n_estimators INTEGER DEFAULT 1200,
max_depth INTEGER DEFAULT 3,
subsample DOUBLE PRECISION DEFAULT 0.5,
learning_rate DOUBLE PRECISION DEFAULT 0.01,
min_samples_leaf INTEGER DEFAULT 1)
RETURNS TABLE (cartodb_id TEXT, prediction NUMERIC, accuracy NUMERIC)
AS $$
from crankshaft.segmentation import create_and_predict_segment
model_params = {'n_estimators': n_estimators, 'max_depth':max_depth, 'subsample' : subsample, 'learning_rate': learning_rate, 'min_samples_leaf' : min_samples_leaf}
return create_and_predict_segment(query,variable_name,target_table, model_params)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION CDB_Gravity(
IN target_query text,
IN weight_column text,
IN source_query text,
IN pop_column text,
IN target bigint,
IN radius integer,
IN minval numeric DEFAULT -10e307
)
RETURNS TABLE(
the_geom geometry,
source_id bigint,
target_id bigint,
dist numeric,
h numeric,
hpop numeric) AS $$
DECLARE
t_id bigint[];
t_geom geometry[];
t_weight numeric[];
s_id bigint[];
s_geom geometry[];
s_pop numeric[];
BEGIN
EXECUTE 'WITH foo as('+target_query+') SELECT array_agg(cartodb_id), array_agg(the_geom), array_agg(' || weight_column || ') FROM foo' INTO t_id, t_geom, t_weight;
EXECUTE 'WITH foo as('+source_query+') SELECT array_agg(cartodb_id), array_agg(the_geom), array_agg(' || pop_column || ') FROM foo' INTO s_id, s_geom, s_pop;
RETURN QUERY
SELECT g.* FROM t, s, CDB_Gravity(t_id, t_geom, t_weight, s_id, s_geom, s_pop, target, radius, minval) g;
END;
$$ language plpgsql;
CREATE OR REPLACE FUNCTION CDB_Gravity(
IN t_id bigint[],
IN t_geom geometry[],
IN t_weight numeric[],
IN s_id bigint[],
IN s_geom geometry[],
IN s_pop numeric[],
IN target bigint,
IN radius integer,
IN minval numeric DEFAULT -10e307
)
RETURNS TABLE(
the_geom geometry,
source_id bigint,
target_id bigint,
dist numeric,
h numeric,
hpop numeric) AS $$
DECLARE
t_type text;
s_type text;
t_center geometry[];
s_center geometry[];
BEGIN
t_type := GeometryType(t_geom[1]);
s_type := GeometryType(s_geom[1]);
IF t_type = 'POINT' THEN
t_center := t_geom;
ELSE
WITH tmp as (SELECT unnest(t_geom) as g) SELECT array_agg(ST_Centroid(g)) INTO t_center FROM tmp;
END IF;
IF s_type = 'POINT' THEN
s_center := s_geom;
ELSE
WITH tmp as (SELECT unnest(s_geom) as g) SELECT array_agg(ST_Centroid(g)) INTO s_center FROM tmp;
END IF;
RETURN QUERY
with target0 as(
SELECT unnest(t_center) as tc, unnest(t_weight) as tw, unnest(t_id) as td
),
source0 as(
SELECT unnest(s_center) as sc, unnest(s_id) as sd, unnest (s_geom) as sg, unnest(s_pop) as sp
),
prev0 as(
SELECT
source0.sg,
source0.sd as sourc_id,
coalesce(source0.sp,0) as sp,
target.td as targ_id,
coalesce(target.tw,0) as tw,
GREATEST(1.0,ST_Distance(geography(target.tc), geography(source0.sc)))::numeric as distance
FROM source0
CROSS JOIN LATERAL
(
SELECT
*
FROM target0
WHERE tw > minval
AND ST_DWithin(geography(source0.sc), geography(tc), radius)
) AS target
),
deno as(
SELECT
sourc_id,
sum(tw/distance) as h_deno
FROM
prev0
GROUP BY sourc_id
)
SELECT
p.sg as the_geom,
p.sourc_id as source_id,
p.targ_id as target_id,
case when p.distance > 1 then p.distance else 0.0 end as dist,
100*(p.tw/p.distance)/d.h_deno as h,
p.sp*(p.tw/p.distance)/d.h_deno as hpop
FROM
prev0 p,
deno d
WHERE
p.targ_id = target AND
p.sourc_id = d.sourc_id;
END;
$$ language plpgsql;
-- 0: nearest neighbor
-- 1: barymetric
-- 2: IDW
CREATE OR REPLACE FUNCTION CDB_SpatialInterpolation(
IN query text,
IN point geometry,
IN method integer DEFAULT 1,
IN p1 numeric DEFAULT 0,
IN p2 numeric DEFAULT 0
)
RETURNS numeric AS
$$
DECLARE
gs geometry[];
vs numeric[];
output numeric;
BEGIN
EXECUTE 'WITH a AS('||query||') SELECT array_agg(the_geom), array_agg(attrib) FROM a' INTO gs, vs;
SELECT CDB_SpatialInterpolation(gs, vs, point, method, p1,p2) INTO output FROM a;
RETURN output;
END;
$$
language plpgsql IMMUTABLE;
CREATE OR REPLACE FUNCTION CDB_SpatialInterpolation(
IN geomin geometry[],
IN colin numeric[],
IN point geometry,
IN method integer DEFAULT 1,
IN p1 numeric DEFAULT 0,
IN p2 numeric DEFAULT 0
)
RETURNS numeric AS
$$
DECLARE
gs geometry[];
vs numeric[];
gs2 geometry[];
vs2 numeric[];
g geometry;
vertex geometry[];
sg numeric;
sa numeric;
sb numeric;
sc numeric;
va numeric;
vb numeric;
vc numeric;
output numeric;
BEGIN
output := -999.999;
-- nearest
IF method = 0 THEN
WITH a as (SELECT unnest(geomin) as g, unnest(colin) as v)
SELECT a.v INTO output FROM a ORDER BY point<->a.g LIMIT 1;
RETURN output;
-- barymetric
ELSIF method = 1 THEN
WITH a as (SELECT unnest(geomin) AS e),
b as (SELECT ST_DelaunayTriangles(ST_Collect(a.e),0.001, 0) AS t FROM a),
c as (SELECT (ST_Dump(t)).geom as v FROM b),
d as (SELECT v FROM c WHERE ST_Within(point, v))
SELECT v INTO g FROM d;
IF g is null THEN
-- out of the realm of the input data
RETURN -888.888;
END IF;
-- vertex of the selected cell
WITH a AS (SELECT (ST_DumpPoints(g)).geom AS v)
SELECT array_agg(v) INTO vertex FROM a;
-- retrieve the value of each vertex
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
SELECT ST_area(g), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[2], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[1], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point,vertex[1],vertex[2], point]))) INTO sg, sa, sb, sc;
output := (coalesce(sa,0) * coalesce(va,0) + coalesce(sb,0) * coalesce(vb,0) + coalesce(sc,0) * coalesce(vc,0)) / coalesce(sg);
RETURN output;
-- IDW
-- p1: limit the number of neighbors, 0->no limit
-- p2: order of distance decay, 0-> order 1
ELSIF method = 2 THEN
IF p2 = 0 THEN
p2 := 1;
END IF;
WITH a as (SELECT unnest(geomin) as g, unnest(colin) as v),
b as (SELECT a.g, a.v FROM a ORDER BY point<->a.g)
SELECT array_agg(b.g), array_agg(b.v) INTO gs, vs FROM b;
IF p1::integer>0 THEN
gs2:=gs;
vs2:=vs;
FOR i IN 1..p1
LOOP
gs2 := gs2 || gs[i];
vs2 := vs2 || vs[i];
END LOOP;
ELSE
gs2:=gs;
vs2:=vs;
END IF;
WITH a as (SELECT unnest(gs2) as g, unnest(vs2) as v),
b as (
SELECT
(1/ST_distance(point, a.g)^p2::integer) as k,
(a.v/ST_distance(point, a.g)^p2::integer) as f
FROM a
)
SELECT sum(b.f)/sum(b.k) INTO output FROM b;
RETURN output;
END IF;
RETURN -777.777;
END;
$$
language plpgsql IMMUTABLE;
-- Moran's I Global Measure (public-facing)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocal(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspots(
subquery TEXT,
column_name TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, column_name, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspots(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliers(
subquery TEXT,
attr TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocal(subquery, attr, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Global Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestGlobalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (moran FLOAT, significance FLOAT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (internal function)
CREATE OR REPLACE FUNCTION
_CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT,
num_ngbrs INT,
permutations INT,
geom_col TEXT,
id_col TEXT)
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_AreasOfInterestLocalRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col);
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for HH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialHotspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HH', 'HL');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LL and LH (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialColdspotsRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('LL', 'LH');
$$ LANGUAGE SQL;
-- Moran's I Local Rate only for LH and HL (public-facing function)
CREATE OR REPLACE FUNCTION
CDB_GetSpatialOutliersRate(
subquery TEXT,
numerator TEXT,
denominator TEXT,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, rowid INT, vals NUMERIC)
AS $$
SELECT moran, quads, significance, rowid, vals
FROM cdb_crankshaft._CDB_AreasOfInterestLocalRate(subquery, numerator, denominator, w_type, num_ngbrs, permutations, geom_col, id_col)
WHERE quads IN ('HL', 'LH');
$$ LANGUAGE SQL;
CREATE OR REPLACE FUNCTION CDB_KMeans(query text, no_clusters integer,no_init integer default 20)
RETURNS table (cartodb_id integer, cluster_no integer) as $$
from crankshaft.clustering import kmeans
return kmeans(query,no_clusters,no_init)
$$ language plpythonu;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanS(state Numeric[],the_geom GEOMETRY(Point, 4326), weight NUMERIC)
RETURNS Numeric[] AS
$$
DECLARE
newX NUMERIC;
newY NUMERIC;
newW NUMERIC;
BEGIN
IF weight IS NULL OR the_geom IS NULL THEN
newX = state[1];
newY = state[2];
newW = state[3];
ELSE
newX = state[1] + ST_X(the_geom)*weight;
newY = state[2] + ST_Y(the_geom)*weight;
newW = state[3] + weight;
END IF;
RETURN Array[newX,newY,newW];
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION CDB_WeightedMeanF(state Numeric[])
RETURNS GEOMETRY AS
$$
BEGIN
IF state[3] = 0 THEN
RETURN ST_SetSRID(ST_MakePoint(state[1],state[2]), 4326);
ELSE
RETURN ST_SETSRID(ST_MakePoint(state[1]/state[3], state[2]/state[3]),4326);
END IF;
END
$$ LANGUAGE plpgsql;
-- Create aggregate if it did not exist
DO $$
BEGIN
IF NOT EXISTS (
SELECT *
FROM pg_catalog.pg_proc p
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = p.pronamespace
WHERE n.nspname = 'cdb_crankshaft'
AND p.proname = 'cdb_weightedmean'
AND p.proisagg)
THEN
CREATE AGGREGATE CDB_WeightedMean(geometry(Point, 4326), NUMERIC) (
SFUNC = CDB_WeightedMeanS,
FINALFUNC = CDB_WeightedMeanF,
STYPE = Numeric[],
INITCOND = "{0.0,0.0,0.0}"
);
END IF;
END
$$ LANGUAGE plpgsql;
-- Spatial Markov
-- input table format:
-- id | geom | date_1 | date_2 | date_3
-- 1 | Pt1 | 12.3 | 13.1 | 14.2
-- 2 | Pt2 | 11.0 | 13.2 | 12.5
-- ...
-- Sample Function call:
-- SELECT CDB_SpatialMarkov('SELECT * FROM real_estate',
-- Array['date_1', 'date_2', 'date_3'])
CREATE OR REPLACE FUNCTION
CDB_SpatialMarkovTrend (
subquery TEXT,
time_cols TEXT[],
num_classes INT DEFAULT 7,
w_type TEXT DEFAULT 'knn',
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_col TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id')
RETURNS TABLE (trend NUMERIC, trend_up NUMERIC, trend_down NUMERIC, volatility NUMERIC, rowid INT)
AS $$
from crankshaft.space_time_dynamics import spatial_markov_trend
## TODO: use named parameters or a dictionary
return spatial_markov_trend(subquery, time_cols, num_classes, w_type, num_ngbrs, permutations, geom_col, id_col)
$$ LANGUAGE plpythonu;
-- input table format: identical to above but in a predictable format
-- Sample function call:
-- SELECT cdb_spatial_markov('SELECT * FROM real_estate',
-- 'date_1')
-- CREATE OR REPLACE FUNCTION
-- cdb_spatial_markov (
-- subquery TEXT,
-- time_col_min text,
-- time_col_max text,
-- date_format text, -- '_YYYY_MM_DD'
-- num_time_per_bin INT DEFAULT 1,
-- permutations INT DEFAULT 99,
-- geom_column TEXT DEFAULT 'the_geom',
-- id_col TEXT DEFAULT 'cartodb_id',
-- w_type TEXT DEFAULT 'knn',
-- num_ngbrs int DEFAULT 5)
-- RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
-- AS $$
-- plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
-- from crankshaft.clustering import moran_local
-- # TODO: use named parameters or a dictionary
-- return spatial_markov(subquery, time_cols, permutations, geom_column, id_col, w_type, num_ngbrs)
-- $$ LANGUAGE plpythonu;
--
-- -- input table format:
-- -- id | geom | date | measurement
-- -- 1 | Pt1 | 12/3 | 13.2
-- -- 2 | Pt2 | 11/5 | 11.3
-- -- 3 | Pt1 | 11/13 | 12.9
-- -- 4 | Pt3 | 12/19 | 10.1
-- -- ...
--
-- CREATE OR REPLACE FUNCTION
-- cdb_spatial_markov (
-- subquery TEXT,
-- time_col text,
-- num_time_per_bin INT DEFAULT 1,
-- permutations INT DEFAULT 99,
-- geom_column TEXT DEFAULT 'the_geom',
-- id_col TEXT DEFAULT 'cartodb_id',
-- w_type TEXT DEFAULT 'knn',
-- num_ngbrs int DEFAULT 5)
-- RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
-- AS $$
-- plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
-- from crankshaft.clustering import moran_local
-- # TODO: use named parameters or a dictionary
-- return spatial_markov(subquery, time_cols, permutations, geom_column, id_col, w_type, num_ngbrs)
-- $$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.1.0'
default_version = '0.3.1'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -0,0 +1,5 @@
"""Import all modules"""
import crankshaft.random_seeds
import crankshaft.clustering
import crankshaft.space_time_dynamics
import crankshaft.segmentation

View File

@@ -0,0 +1,3 @@
"""Import all functions from for clustering"""
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -0,0 +1,262 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
import plpy
from collections import OrderedDict
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", attr_name),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
attr_vals = pu.get_attributes(result)
## calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("attr1", attr),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(5)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", numerator),
("attr2", denominator)
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("numerator", numerator),
("denominator", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(5)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_local_bv(subquery, attr1, attr2,
permutations, geom_col, id_col, w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
plpy.notice('** Constructing query')
qvals = OrderedDict([("id_col", id_col),
("attr1", attr1),
("attr2", attr2),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
except plpy.SPIError:
plpy.error("Error: areas of interest query failed, " \
"check input parameters")
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(4)
## collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = quad_position(lisa.q)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1,2 @@
"""Import all functions for pysal_utils"""
from crankshaft.pysal_utils.pysal_utils import *

View File

@@ -0,0 +1,188 @@
"""
Utilities module for generic PySAL functionality, mainly centered on
translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
## Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res dict-like: query results with attributes and neighbors
"""
# if w_type.lower() == 'knn':
# row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
# weights = {x['id']: row_normed_weights for x in query_res}
# else:
# weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
# if len(x['neighbors']) > 0
# else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
print 'len of neighbors: %d' % len(neighbors)
built_weight = ps.W(neighbors)
built_weight.transform = 'r'
return built_weight
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
@param params: dict of information used in query (column names,
table name, etc.)
"""
attr_string = ""
template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
if 'time_cols' in params:
## if markov analysis
attrs = params['time_cols']
for idx, val in enumerate(attrs):
attr_string += template % {"col": val, "alias_num": idx + 1}
else:
## if moran's analysis
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')]
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": params[val], "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Construct where conditions when building neighbors query
Create portion of WHERE clauses for weeding out NULL-valued geometries
Input: dict of params:
{'subquery': ...,
'numerator': 'data1',
'denominator': 'data2',
'': ...}
Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
Input:
{'subquery': ...,
'time_cols': ['time1', 'time2', 'time3'],
'etc': ...}
Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
NULL AND idx_replace."time3" IS NOT NULL'
"""
attr_string = []
template = "idx_replace.\"%s\" IS NOT NULL"
if 'time_cols' in params:
## markov where clauses
attrs = params['time_cols']
# add values to template
for attr in attrs:
attr_string.append(template % attr)
else:
## moran where clauses
# get keys
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')])
# add values to template
for attr in attrs:
attr_string.append(template % params[attr])
if len(attrs) == 2:
attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,11 @@
"""Random seed generator used for non-deterministic functions in crankshaft"""
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1 @@
from segmentation import *

View File

@@ -0,0 +1,176 @@
"""
Segmentation creation and prediction
"""
import sklearn
import numpy as np
import plpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import metrics
from sklearn.cross_validation import train_test_split
# Lower level functions
#----------------------
def replace_nan_with_mean(array):
"""
Input:
@param array: an array of floats which may have null-valued entries
Output:
array with nans filled in with the mean of the dataset
"""
# returns an array of rows and column indices
indices = np.where(np.isnan(array))
# iterate through entries which have nan values
for row, col in zip(*indices):
array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
return array
def get_data(variable, feature_columns, query):
"""
Fetch data from the database, clean, and package into
numpy arrays
Input:
@param variable: name of the target variable
@param feature_columns: list of column names
@param query: subquery that data is pulled from for the packaging
Output:
prepared data, packaged into NumPy arrays
"""
columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
try:
data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
variable=variable,
columns=columns,
query=query))
except Exception, e:
plpy.error('Failed to access data to build segmentation model: %s' % e)
# extract target data from plpy object
target = np.array(data[0]['target'])
# put n feature data arrays into an n x m array of arrays
features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
return replace_nan_with_mean(target), replace_nan_with_mean(features)
# High level interface
# --------------------
def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
"""
Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
the function.
Input:
@param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
@param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
@param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
@param model_parameters: A dictionary containing parameters for the model.
"""
clean_target = replace_nan_with_mean(target)
clean_features = replace_nan_with_mean(features)
target_features = replace_nan_with_mean(target_features)
model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
prediction = model.predict(target_features)
accuracy_array = [accuracy]*prediction.shape[0]
return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
def create_and_predict_segment(query, variable, target_query, model_params):
"""
generate a segment with machine learning
Stuart Lynn
"""
## fetch column names
try:
columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1 '.format(query=query))[0].keys()
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
## extract column names to be used in building the segmentation model
feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
## get data from database
target, features = get_data(variable, feature_columns, query)
model, accuracy = train_model(target, features, model_params, 0.2)
cartodb_ids, result = predict_segment(model, feature_columns, target_query)
accuracy_array = [accuracy]*result.shape[0]
return zip(cartodb_ids, result, accuracy_array)
def train_model(target, features, model_params, test_split):
"""
Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
Input:
@param target: 1D Array of the variable that the model is to be trianed to predict
@param features: 2D Array NSamples * NFeatures to use in trining the model
@param model_params: A dictionary of model parameters, the full specification can be found on the
scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
@parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
"""
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
model = GradientBoostingRegressor(**model_params)
model.fit(features_train, target_train)
accuracy = calculate_model_accuracy(model, features, target)
return model, accuracy
def calculate_model_accuracy(model, features, target):
"""
Calculate the mean squared error of the model prediction
Input:
@param model: model trained from input features
@param features: features to make a prediction from
@param target: target to compare prediction to
Output:
mean squared error of the model prection compared to the target
"""
prediction = model.predict(features)
return metrics.mean_squared_error(prediction, target)
def predict_segment(model, features, target_query):
"""
Use the provided model to predict the values for the new feature set
Input:
@param model: The pretrained model
@features: A list of features to use in the model prediction (list of column names)
@target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
"""
batch_size = 1000
joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
try:
cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
joined_features=joined_features,
target_query=target_query))
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
results = []
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
#Need to fix this. Should be global mean. This will cause weird effects
batch = replace_nan_with_mean(batch)
prediction = model.predict(batch)
results.append(prediction)
try:
cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
return cartodb_ids, np.concatenate(results)

View File

@@ -0,0 +1,2 @@
"""Import all functions from clustering libraries."""
from markov import *

View File

@@ -0,0 +1,189 @@
"""
Spatial dynamics measurements using Spatial Markov
"""
import numpy as np
import pysal as ps
import plpy
import crankshaft.pysal_utils as pu
def spatial_markov_trend(subquery, time_cols, num_classes=7,
w_type='knn', num_ngbrs=5, permutations=0,
geom_col='the_geom', id_col='cartodb_id'):
"""
Predict the trends of a unit based on:
1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
2. average class of its neighbors
Inputs:
@param subquery string: e.g., SELECT the_geom, cartodb_id,
interesting_time_column FROM table_name
@param time_cols list of strings: list of strings of column names
@param num_classes (optional): number of classes to break distribution
of values into. Currently uses quantile bins.
@param w_type string (optional): weight type ('knn' or 'queen')
@param num_ngbrs int (optional): number of neighbors (if knn type)
@param permutations int (optional): number of permutations for test
stats
@param geom_col string (optional): name of column which contains the
geometries
@param id_col string (optional): name of column which has the ids of
the table
Outputs:
@param trend_up float: probablity that a geom will move to a higher
class
@param trend_down float: probablity that a geom will move to a lower
class
@param trend float: (trend_up - trend_down) / trend_static
@param volatility float: a measure of the volatility based on
probability stddev(prob array)
"""
if len(time_cols) < 2:
plpy.error('More than one time column needs to be passed')
qvals = {"id_col": id_col,
"time_cols": time_cols,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
try:
query_result = plpy.execute(
pu.construct_neighbor_query(w_type, qvals)
)
if len(query_result) == 0:
return zip([None], [None], [None], [None], [None])
except plpy.SPIError, err:
plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
plpy.error('Query failed, check the input parameters')
return zip([None], [None], [None], [None], [None])
## build weight
weights = pu.get_weight(query_result, w_type)
weights.transform = 'r'
## prep time data
t_data = get_time_data(query_result, time_cols)
plpy.debug('shape of t_data %d, %d' % t_data.shape)
plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
plpy.debug('first num elements: %f' % t_data[0, 0])
sp_markov_result = ps.Spatial_Markov(t_data,
weights,
k=num_classes,
fixed=False,
permutations=permutations)
## get lag classes
lag_classes = ps.Quantiles(
ps.lag_spatial(weights, t_data[:, -1]),
k=num_classes).yb
## look up probablity distribution for each unit according to class and lag class
prob_dist = get_prob_dist(sp_markov_result.P,
lag_classes,
sp_markov_result.classes[:, -1])
## find the ups and down and overall distribution of each cell
trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
sp_markov_result.classes[:, -1])
## output the results
return zip(trend, trend_up, trend_down, volatility, weights.id_order)
def get_time_data(markov_data, time_cols):
"""
Extract the time columns and bin appropriately
"""
num_attrs = len(time_cols)
return np.array([[x['attr' + str(i)] for x in markov_data]
for i in range(1, num_attrs+1)], dtype=float).transpose()
## not currently used
def rebin_data(time_data, num_time_per_bin):
"""
Convert an n x l matrix into an (n/m) x l matrix where the values are
reduced (averaged) for the intervening states:
1 2 3 4 1.5 3.5
5 6 7 8 -> 5.5 7.5
9 8 7 6 8.5 6.5
5 4 3 2 4.5 2.5
if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
This process effectively resamples the data at a longer time span n
units longer than the input data.
For cases when there is a remainder (remainder(5/3) = 2), the remaining
two columns are binned together as the last time period, while the
first three are binned together for the first period.
Input:
@param time_data n x l ndarray: measurements of an attribute at
different time intervals
@param num_time_per_bin int: number of columns to average into a new
column
Output:
ceil(n / m) x l ndarray of resampled time series
"""
if time_data.shape[1] % num_time_per_bin == 0:
## if fit is perfect, then use it
n_max = time_data.shape[1] / num_time_per_bin
else:
## fit remainders into an additional column
n_max = time_data.shape[1] / num_time_per_bin + 1
return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
for i in range(n_max)]).T
def get_prob_dist(transition_matrix, lag_indices, unit_indices):
"""
Given an array of transition matrices, look up the probability
associated with the arrangements passed
Input:
@param transition_matrix ndarray[k,k,k]:
@param lag_indices ndarray:
@param unit_indices ndarray:
Output:
Array of probability distributions
"""
return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
for i in range(len(lag_indices))])
def get_prob_stats(prob_dist, unit_indices):
"""
get the statistics of the probability distributions
Outputs:
@param trend_up ndarray(float): sum of probabilities for upward
movement (relative to the unit index of that prob)
@param trend_down ndarray(float): sum of probabilities for downward
movement (relative to the unit index of that prob)
@param trend ndarray(float): difference of upward and downward
movements
"""
num_elements = len(unit_indices)
trend_up = np.empty(num_elements, dtype=float)
trend_down = np.empty(num_elements, dtype=float)
trend = np.empty(num_elements, dtype=float)
for i in range(num_elements):
trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
if prob_dist[i, unit_indices[i]] > 0.0:
trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
else:
trend[i] = None
## calculate volatility of distribution
volatility = prob_dist.std(axis=1)
return trend_up, trend_down, trend, volatility

View File

@@ -0,0 +1,49 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.2.0',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
# IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1 @@
[[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,52 @@
import re
class MockCursor:
def __init__(self, data):
self.cursor_pos = 0
self.data = data
def fetch(self, batch_size):
batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
self.cursor_pos += batch_size
return batch
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def debug(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def cursor(self, query):
data = self.execute(query)
return MockCursor(data)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -0,0 +1,88 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_markov = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,142 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_array = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans = "i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, "
ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, "
self.assertEqual(pu.query_attr_select(self.params), ans)
self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans = "idx_replace.\"andy\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" <> 0"
ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
"idx_replace.\"_2014_jan\" IS NOT NULL AND " \
"idx_replace.\"_2014_feb\" IS NOT NULL"
self.assertEqual(pu.query_attr_where(self.params), ans)
self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
def test_knn(self):
"""Test knn neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0 " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
ans_array = "SELECT i.\"cartodb_id\" As id, " \
"i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"_2013_dec\" IS NOT NULL AND " \
"j.\"_2014_jan\" IS NOT NULL AND " \
"j.\"_2014_feb\" IS NOT NULL " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"_2013_dec\" IS NOT NULL AND " \
"i.\"_2014_jan\" IS NOT NULL AND " \
"i.\"_2014_feb\" IS NOT NULL "\
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params), ans)
self.assertEqual(pu.knn(self.params_array), ans_array)
def test_queen(self):
"""Test queen neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params), ans)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
pu.knn(self.params))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -0,0 +1,64 @@
import unittest
import numpy as np
from helper import plpy, fixture_file
import crankshaft.segmentation as segmentation
import json
class SegmentationTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
def generate_random_data(self,n_samples,random_state, row_type=False):
x1 = random_state.uniform(size=n_samples)
x2 = random_state.uniform(size=n_samples)
x3 = random_state.randint(0, 4, size=n_samples)
y = x1+x2*x2+x3
cartodb_id = range(len(x1))
if row_type:
return [ {'features': vals} for vals in zip(x1,x2,x3)], y
else:
return [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
def test_replace_nan_with_mean(self):
test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
def test_create_and_predict_segment(self):
n_samples = 1000
random_state_train = np.random.RandomState(13)
random_state_test = np.random.RandomState(134)
training_data = self.generate_random_data(n_samples, random_state_train)
test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
ids = [{'cartodb_ids': range(len(test_data))}]
rows = [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
plpy._define_result('select \* from \(select \* from training\) a limit 1',rows)
plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
plpy._define_result('.*select \* from test.*' ,test_data)
model_parameters = {'n_estimators': 1200,
'max_depth': 3,
'subsample' : 0.5,
'learning_rate': 0.01,
'min_samples_leaf': 1}
result = segmentation.create_and_predict_segment(
'select * from training',
'target',
'select * from test',
model_parameters)
prediction = [r[1] for r in result]
accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
self.assertEqual(len(result),len(test_data))
self.assertTrue( result[0][2] < 0.01)
self.assertTrue( accuracy < 0.5*np.mean(test_y) )

View File

@@ -0,0 +1,324 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.space_time_dynamics as std
from crankshaft import random_seeds
import json
class SpaceTimeTests(unittest.TestCase):
"""Testing class for Markov Functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
self.markov_data = json.loads(open(fixture_file('markov.json')).read())
self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
self.transition_matrix = np.array([
[[ 0.96341463, 0.0304878 , 0.00609756, 0. , 0. ],
[ 0.06040268, 0.83221477, 0.10738255, 0. , 0. ],
[ 0. , 0.14 , 0.74 , 0.12 , 0. ],
[ 0. , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
[ 0. , 0. , 0. , 0.16666667, 0.83333333]],
[[ 0.79831933, 0.16806723, 0.03361345, 0. , 0. ],
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0. ],
[ 0. , 0. , 0.06372549, 0.90196078, 0.03431373],
[ 0. , 0. , 0. , 0.19444444, 0.80555556]],
[[ 0.84693878, 0.15306122, 0. , 0. , 0. ],
[ 0.08133971, 0.78947368, 0.1291866 , 0. , 0. ],
[ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0. , 0. , 0. , 0.10204082, 0.89795918]],
[[ 0.8852459 , 0.09836066, 0. , 0.01639344, 0. ],
[ 0.03875969, 0.81395349, 0.13953488, 0. , 0.00775194],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
[ 0. , 0. , 0. , 0.09661836, 0.90338164]],
[[ 0.33333333, 0.66666667, 0. , 0. , 0. ],
[ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0. ],
[ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0. ],
[ 0. , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]]]
)
def test_spatial_markov(self):
"""Test Spatial Markov."""
data = [ { 'id': d['id'],
'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'],
'neighbors': d['neighbors'] } for d in self.neighbors_data]
print(str(data[0]))
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
self.assertTrue(result != None)
result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
print result[0]
expected = self.markov_data
for ([res_trend, res_up, res_down, res_vol, res_id],
[exp_trend, exp_up, exp_down, exp_vol, exp_id]
) in zip(result, expected):
self.assertAlmostEqual(res_trend, exp_trend)
def test_get_time_data(self):
"""Test get_time_data"""
data = [ { 'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'] } for d in self.neighbors_data]
result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
## expected was prepared from PySAL example:
### f = ps.open(ps.examples.get_path("usjoin.csv"))
### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
### rpci = pci / (pci.mean(axis = 0))
expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154, 0.83271652
, 0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
, 0.86148267, 0.86252252, 0.86746356],
[ 0.9188951, 0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
, 0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
, 0.90139505, 0.87897455, 0.86216858],
[ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
, 0.80584442, 0.8084998, 0.82258551, 0.82668196, 0.82373724, 0.81814804
, 0.83675961, 0.83574199, 0.84647177],
[ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
, 1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
, 1.12958508, 1.11371818, 1.09936775],
[ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
, 1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
, 1.10863242, 1.09642516, 1.07762337],
[ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
, 1.44782832, 1.41978227, 1.39092208, 1.4059372, 1.40788646, 1.44052766
, 1.45241216, 1.43306098, 1.4174431 ],
[ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
, 1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
, 1.08401709, 1.06959206, 1.07875225],
[ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
, 0.9880258, 0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
, 1.01749645, 0.9936394, 0.98279746],
[ 0.98996986, 1.00143564, 0.99491, 1.00188408, 1.00455845, 0.99127006
, 0.97925917, 0.9683482, 0.95335147, 0.93694787, 0.94308213, 0.92232874
, 0.91284091, 0.89689833, 0.88928858],
[ 0.87418391, 0.86416601, 0.84425695, 0.8404494, 0.83903044, 0.8578708
, 0.86036185, 0.86107306, 0.8500772, 0.86981998, 0.86837929, 0.87204141
, 0.86633032, 0.84946077, 0.83287146],
[ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
, 1.12349752, 1.12596664, 1.12213996, 1.1119989, 1.10257792, 1.10491258
, 1.11059842, 1.10509795, 1.10020097],
[ 0.97282463, 0.96700147, 0.96252588, 0.9653878, 0.96057687, 0.95831051
, 0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
, 0.89280392, 0.89298243, 0.89165385],
[ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
, 0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
, 0.94281559, 0.96520315, 0.96715441],
[ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
, 0.99237486, 0.98209969, 0.9877653, 0.97399471, 0.96910087, 0.98416665
, 0.98423613, 0.99823861, 0.99545704],
[ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012, 0.86191535
, 0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
, 0.8198736, 0.82265395, 0.8399404 ],
[ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
, 0.86008789, 0.86776298, 0.86720209, 0.8676334, 0.89179317, 0.94202108
, 0.9422231, 0.93902708, 0.94479184],
[ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
, 0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
, 0.90543476, 0.91737138, 0.94793657],
[ 1.1977611, 1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
, 1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
, 1.23418676, 1.23626739, 1.26754398],
[ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
, 1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
, 1.32936244, 1.33001241, 1.33288462],
[ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
, 1.03405057, 1.02747623, 1.03162734, 0.9961416, 0.97356208, 0.94241549
, 0.92754547, 0.92549227, 0.92138102],
[ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
, 1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
, 1.10577769, 1.11166825, 1.09985155],
[ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
, 0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
, 0.77487118, 0.77768683, 0.78801192],
[ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
, 0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
, 0.93069645, 0.94020973, 0.94358232],
[ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
, 0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
, 0.87863799, 0.87497981, 0.87888675],
[ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
, 0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
, 0.99323123, 1.00204939, 0.99602148],
[ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
, 1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
, 1.08603695, 1.05997314, 1.02160943],
[ 1.11368269, 1.1057147, 1.11893431, 1.13778669, 1.1432272, 1.18257029
, 1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
, 1.12357937, 1.1159258, 1.12570828],
[ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
, 1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743, 1.28656675
, 1.29734097, 1.29390205, 1.29345446],
[ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
, 0.82848011, 0.8259679, 0.82435705, 0.83108634, 0.84373784, 0.83891093
, 0.84349247, 0.85637272, 0.86539395],
[ 1.23450087, 1.2426022, 1.23537935, 1.23581293, 1.24522626, 1.2256767
, 1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
, 1.27962009, 1.27968392, 1.25907738],
[ 0.9769662, 0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
, 0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
, 0.91475304, 0.90249622, 0.9021363 ],
[ 0.84986886, 0.8986851, 0.84295997, 0.87280534, 0.85659368, 0.88937573
, 0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
, 0.96635366, 1.02520312, 1.0394296 ],
[ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
, 0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168, 0.94779638
, 0.93759089, 0.93353431, 0.94121705],
[ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
, 0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
, 0.88283929, 0.90349491, 0.90600675],
[ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
, 0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
, 0.95011401, 0.9434686, 0.94637765],
[ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
, 1.03024941, 1.04200483, 1.0402554, 1.03296979, 1.02191682, 1.02476275
, 1.02347523, 1.02517684, 1.04359571],
[ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043, 1.0531801
, 1.07452771, 1.09383478, 1.1052447, 1.10322136, 1.09167939, 1.08772756
, 1.08859544, 1.09177338, 1.1096083 ],
[ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
, 0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
, 0.83374329, 0.82812044, 0.82878599],
[ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
, 0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497, 0.93869677
, 0.96947771, 1.00362626, 0.98102351],
[ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967, 0.93092109
, 0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
, 0.90430364, 0.89994694, 0.90073864],
[ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
, 0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
, 0.9700702, 0.97158948, 0.95884908],
[ 0.83980439, 0.84726737, 0.85747, 0.85467221, 0.8556751, 0.84818516
, 0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
, 0.83511679, 0.82136617, 0.80921874],
[ 0.95118156, 0.9466212, 0.94688098, 0.9508583, 0.9512441, 0.95440787
, 0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
, 0.97001634, 0.97082733, 0.98782366],
[ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
, 1.1214086, 1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
, 1.14474747, 1.14084037, 1.16142926],
[ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
, 1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
, 1.09831768, 1.09298232, 1.09176125],
[ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
, 0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
, 0.80172182, 0.82168155, 0.85587911],
[ 1.0052447, 1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
, 0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
, 0.96519561, 0.96173403, 0.95890284],
[ 0.95808419, 0.9382568, 0.9654441, 0.95561201, 0.96987289, 0.96608031
, 0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263, 1.1704974
, 1.17025154, 1.18730553, 1.14242645]])
self.assertTrue(np.allclose(result, expected))
self.assertTrue(type(result) == type(expected))
self.assertTrue(result.shape == expected.shape)
def test_rebin_data(self):
"""Test rebin_data"""
## sample in double the time (even case since 10 % 2 = 0):
## (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
## = 0.5, 2.5, 4.5, 6.5, 8.5
ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
for i in range(0, 10, 2)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
## sample in triple the time (uneven since 10 % 3 = 1):
## (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
## = 1, 4, 7, 9
ans_odd = np.array([i * np.ones(10, dtype=float)
for i in (1, 4, 7, 9)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
def test_get_prob_dist(self):
"""Test get_prob_dist"""
lag_indices = np.array([1, 2, 3, 4])
unit_indices = np.array([1, 3, 2, 4])
answer = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
self.assertTrue(np.array_equal(result, answer))
def test_get_prob_stats(self):
"""Test get_prob_stats"""
probs = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
unit_indices = np.array([1, 3, 2, 4])
answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824, 0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
answer_volatility = np.array([ 0.34221495, 0.33705421, 0.29226542, 0.38834223])
result = std.get_prob_stats(probs, unit_indices)
result_up = result[0]
result_down = result[1]
result_trend = result[2]
result_volatility = result[3]
self.assertTrue(np.allclose(result_up, answer_up))
self.assertTrue(np.allclose(result_down, answer_down))
self.assertTrue(np.allclose(result_trend, answer_trend))
self.assertTrue(np.allclose(result_volatility, answer_volatility))

View File

@@ -0,0 +1,5 @@
"""Import all modules"""
import crankshaft.random_seeds
import crankshaft.clustering
import crankshaft.space_time_dynamics
import crankshaft.segmentation

View File

@@ -0,0 +1,3 @@
"""Import all functions from for clustering"""
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -0,0 +1,262 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
import plpy
from collections import OrderedDict
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", attr_name),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
attr_vals = pu.get_attributes(result)
## calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("attr1", attr),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(5)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", numerator),
("attr2", denominator)
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("numerator", numerator),
("denominator", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(5)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_local_bv(subquery, attr1, attr2,
permutations, geom_col, id_col, w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
plpy.notice('** Constructing query')
qvals = OrderedDict([("id_col", id_col),
("attr1", attr1),
("attr2", attr2),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
except plpy.SPIError:
plpy.error("Error: areas of interest query failed, " \
"check input parameters")
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(4)
## collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = quad_position(lisa.q)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1,2 @@
"""Import all functions for pysal_utils"""
from crankshaft.pysal_utils.pysal_utils import *

View File

@@ -0,0 +1,188 @@
"""
Utilities module for generic PySAL functionality, mainly centered on
translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
## Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res dict-like: query results with attributes and neighbors
"""
# if w_type.lower() == 'knn':
# row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
# weights = {x['id']: row_normed_weights for x in query_res}
# else:
# weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
# if len(x['neighbors']) > 0
# else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
print 'len of neighbors: %d' % len(neighbors)
built_weight = ps.W(neighbors)
built_weight.transform = 'r'
return built_weight
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
@param params: dict of information used in query (column names,
table name, etc.)
"""
attr_string = ""
template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
if 'time_cols' in params:
## if markov analysis
attrs = params['time_cols']
for idx, val in enumerate(attrs):
attr_string += template % {"col": val, "alias_num": idx + 1}
else:
## if moran's analysis
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')]
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": params[val], "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Construct where conditions when building neighbors query
Create portion of WHERE clauses for weeding out NULL-valued geometries
Input: dict of params:
{'subquery': ...,
'numerator': 'data1',
'denominator': 'data2',
'': ...}
Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
Input:
{'subquery': ...,
'time_cols': ['time1', 'time2', 'time3'],
'etc': ...}
Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
NULL AND idx_replace."time3" IS NOT NULL'
"""
attr_string = []
template = "idx_replace.\"%s\" IS NOT NULL"
if 'time_cols' in params:
## markov where clauses
attrs = params['time_cols']
# add values to template
for attr in attrs:
attr_string.append(template % attr)
else:
## moran where clauses
# get keys
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')])
# add values to template
for attr in attrs:
attr_string.append(template % params[attr])
if len(attrs) == 2:
attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,11 @@
"""Random seed generator used for non-deterministic functions in crankshaft"""
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1 @@
from segmentation import *

View File

@@ -0,0 +1,176 @@
"""
Segmentation creation and prediction
"""
import sklearn
import numpy as np
import plpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import metrics
from sklearn.cross_validation import train_test_split
# Lower level functions
#----------------------
def replace_nan_with_mean(array):
"""
Input:
@param array: an array of floats which may have null-valued entries
Output:
array with nans filled in with the mean of the dataset
"""
# returns an array of rows and column indices
indices = np.where(np.isnan(array))
# iterate through entries which have nan values
for row, col in zip(*indices):
array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
return array
def get_data(variable, feature_columns, query):
"""
Fetch data from the database, clean, and package into
numpy arrays
Input:
@param variable: name of the target variable
@param feature_columns: list of column names
@param query: subquery that data is pulled from for the packaging
Output:
prepared data, packaged into NumPy arrays
"""
columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
try:
data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
variable=variable,
columns=columns,
query=query))
except Exception, e:
plpy.error('Failed to access data to build segmentation model: %s' % e)
# extract target data from plpy object
target = np.array(data[0]['target'])
# put n feature data arrays into an n x m array of arrays
features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
return replace_nan_with_mean(target), replace_nan_with_mean(features)
# High level interface
# --------------------
def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
"""
Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
the function.
Input:
@param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
@param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
@param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
@param model_parameters: A dictionary containing parameters for the model.
"""
clean_target = replace_nan_with_mean(target)
clean_features = replace_nan_with_mean(features)
target_features = replace_nan_with_mean(target_features)
model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
prediction = model.predict(target_features)
accuracy_array = [accuracy]*prediction.shape[0]
return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
def create_and_predict_segment(query, variable, target_query, model_params):
"""
generate a segment with machine learning
Stuart Lynn
"""
## fetch column names
try:
columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1 '.format(query=query))[0].keys()
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
## extract column names to be used in building the segmentation model
feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
## get data from database
target, features = get_data(variable, feature_columns, query)
model, accuracy = train_model(target, features, model_params, 0.2)
cartodb_ids, result = predict_segment(model, feature_columns, target_query)
accuracy_array = [accuracy]*result.shape[0]
return zip(cartodb_ids, result, accuracy_array)
def train_model(target, features, model_params, test_split):
"""
Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
Input:
@param target: 1D Array of the variable that the model is to be trianed to predict
@param features: 2D Array NSamples * NFeatures to use in trining the model
@param model_params: A dictionary of model parameters, the full specification can be found on the
scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
@parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
"""
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
model = GradientBoostingRegressor(**model_params)
model.fit(features_train, target_train)
accuracy = calculate_model_accuracy(model, features, target)
return model, accuracy
def calculate_model_accuracy(model, features, target):
"""
Calculate the mean squared error of the model prediction
Input:
@param model: model trained from input features
@param features: features to make a prediction from
@param target: target to compare prediction to
Output:
mean squared error of the model prection compared to the target
"""
prediction = model.predict(features)
return metrics.mean_squared_error(prediction, target)
def predict_segment(model, features, target_query):
"""
Use the provided model to predict the values for the new feature set
Input:
@param model: The pretrained model
@features: A list of features to use in the model prediction (list of column names)
@target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
"""
batch_size = 1000
joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
try:
cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
joined_features=joined_features,
target_query=target_query))
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
results = []
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
#Need to fix this. Should be global mean. This will cause weird effects
batch = replace_nan_with_mean(batch)
prediction = model.predict(batch)
results.append(prediction)
try:
cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
return cartodb_ids, np.concatenate(results)

View File

@@ -0,0 +1,2 @@
"""Import all functions from clustering libraries."""
from markov import *

View File

@@ -0,0 +1,189 @@
"""
Spatial dynamics measurements using Spatial Markov
"""
import numpy as np
import pysal as ps
import plpy
import crankshaft.pysal_utils as pu
def spatial_markov_trend(subquery, time_cols, num_classes=7,
w_type='knn', num_ngbrs=5, permutations=0,
geom_col='the_geom', id_col='cartodb_id'):
"""
Predict the trends of a unit based on:
1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
2. average class of its neighbors
Inputs:
@param subquery string: e.g., SELECT the_geom, cartodb_id,
interesting_time_column FROM table_name
@param time_cols list of strings: list of strings of column names
@param num_classes (optional): number of classes to break distribution
of values into. Currently uses quantile bins.
@param w_type string (optional): weight type ('knn' or 'queen')
@param num_ngbrs int (optional): number of neighbors (if knn type)
@param permutations int (optional): number of permutations for test
stats
@param geom_col string (optional): name of column which contains the
geometries
@param id_col string (optional): name of column which has the ids of
the table
Outputs:
@param trend_up float: probablity that a geom will move to a higher
class
@param trend_down float: probablity that a geom will move to a lower
class
@param trend float: (trend_up - trend_down) / trend_static
@param volatility float: a measure of the volatility based on
probability stddev(prob array)
"""
if len(time_cols) < 2:
plpy.error('More than one time column needs to be passed')
qvals = {"id_col": id_col,
"time_cols": time_cols,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
try:
query_result = plpy.execute(
pu.construct_neighbor_query(w_type, qvals)
)
if len(query_result) == 0:
return zip([None], [None], [None], [None], [None])
except plpy.SPIError, err:
plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
plpy.error('Query failed, check the input parameters')
return zip([None], [None], [None], [None], [None])
## build weight
weights = pu.get_weight(query_result, w_type)
weights.transform = 'r'
## prep time data
t_data = get_time_data(query_result, time_cols)
plpy.debug('shape of t_data %d, %d' % t_data.shape)
plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
plpy.debug('first num elements: %f' % t_data[0, 0])
sp_markov_result = ps.Spatial_Markov(t_data,
weights,
k=num_classes,
fixed=False,
permutations=permutations)
## get lag classes
lag_classes = ps.Quantiles(
ps.lag_spatial(weights, t_data[:, -1]),
k=num_classes).yb
## look up probablity distribution for each unit according to class and lag class
prob_dist = get_prob_dist(sp_markov_result.P,
lag_classes,
sp_markov_result.classes[:, -1])
## find the ups and down and overall distribution of each cell
trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
sp_markov_result.classes[:, -1])
## output the results
return zip(trend, trend_up, trend_down, volatility, weights.id_order)
def get_time_data(markov_data, time_cols):
"""
Extract the time columns and bin appropriately
"""
num_attrs = len(time_cols)
return np.array([[x['attr' + str(i)] for x in markov_data]
for i in range(1, num_attrs+1)], dtype=float).transpose()
## not currently used
def rebin_data(time_data, num_time_per_bin):
"""
Convert an n x l matrix into an (n/m) x l matrix where the values are
reduced (averaged) for the intervening states:
1 2 3 4 1.5 3.5
5 6 7 8 -> 5.5 7.5
9 8 7 6 8.5 6.5
5 4 3 2 4.5 2.5
if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
This process effectively resamples the data at a longer time span n
units longer than the input data.
For cases when there is a remainder (remainder(5/3) = 2), the remaining
two columns are binned together as the last time period, while the
first three are binned together for the first period.
Input:
@param time_data n x l ndarray: measurements of an attribute at
different time intervals
@param num_time_per_bin int: number of columns to average into a new
column
Output:
ceil(n / m) x l ndarray of resampled time series
"""
if time_data.shape[1] % num_time_per_bin == 0:
## if fit is perfect, then use it
n_max = time_data.shape[1] / num_time_per_bin
else:
## fit remainders into an additional column
n_max = time_data.shape[1] / num_time_per_bin + 1
return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
for i in range(n_max)]).T
def get_prob_dist(transition_matrix, lag_indices, unit_indices):
"""
Given an array of transition matrices, look up the probability
associated with the arrangements passed
Input:
@param transition_matrix ndarray[k,k,k]:
@param lag_indices ndarray:
@param unit_indices ndarray:
Output:
Array of probability distributions
"""
return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
for i in range(len(lag_indices))])
def get_prob_stats(prob_dist, unit_indices):
"""
get the statistics of the probability distributions
Outputs:
@param trend_up ndarray(float): sum of probabilities for upward
movement (relative to the unit index of that prob)
@param trend_down ndarray(float): sum of probabilities for downward
movement (relative to the unit index of that prob)
@param trend ndarray(float): difference of upward and downward
movements
"""
num_elements = len(unit_indices)
trend_up = np.empty(num_elements, dtype=float)
trend_down = np.empty(num_elements, dtype=float)
trend = np.empty(num_elements, dtype=float)
for i in range(num_elements):
trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
if prob_dist[i, unit_indices[i]] > 0.0:
trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
else:
trend[i] = None
## calculate volatility of distribution
volatility = prob_dist.std(axis=1)
return trend_up, trend_down, trend, volatility

View File

@@ -0,0 +1,49 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.3.0',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
# IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1 @@
[[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,52 @@
import re
class MockCursor:
def __init__(self, data):
self.cursor_pos = 0
self.data = data
def fetch(self, batch_size):
batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
self.cursor_pos += batch_size
return batch
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def debug(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def cursor(self, query):
data = self.execute(query)
return MockCursor(data)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -0,0 +1,88 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_markov = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,142 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_array = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans = "i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, "
ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, "
self.assertEqual(pu.query_attr_select(self.params), ans)
self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans = "idx_replace.\"andy\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" <> 0"
ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
"idx_replace.\"_2014_jan\" IS NOT NULL AND " \
"idx_replace.\"_2014_feb\" IS NOT NULL"
self.assertEqual(pu.query_attr_where(self.params), ans)
self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
def test_knn(self):
"""Test knn neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0 " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
ans_array = "SELECT i.\"cartodb_id\" As id, " \
"i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"_2013_dec\" IS NOT NULL AND " \
"j.\"_2014_jan\" IS NOT NULL AND " \
"j.\"_2014_feb\" IS NOT NULL " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"_2013_dec\" IS NOT NULL AND " \
"i.\"_2014_jan\" IS NOT NULL AND " \
"i.\"_2014_feb\" IS NOT NULL "\
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params), ans)
self.assertEqual(pu.knn(self.params_array), ans_array)
def test_queen(self):
"""Test queen neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params), ans)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
pu.knn(self.params))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -0,0 +1,64 @@
import unittest
import numpy as np
from helper import plpy, fixture_file
import crankshaft.segmentation as segmentation
import json
class SegmentationTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
def generate_random_data(self,n_samples,random_state, row_type=False):
x1 = random_state.uniform(size=n_samples)
x2 = random_state.uniform(size=n_samples)
x3 = random_state.randint(0, 4, size=n_samples)
y = x1+x2*x2+x3
cartodb_id = range(len(x1))
if row_type:
return [ {'features': vals} for vals in zip(x1,x2,x3)], y
else:
return [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
def test_replace_nan_with_mean(self):
test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
def test_create_and_predict_segment(self):
n_samples = 1000
random_state_train = np.random.RandomState(13)
random_state_test = np.random.RandomState(134)
training_data = self.generate_random_data(n_samples, random_state_train)
test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
ids = [{'cartodb_ids': range(len(test_data))}]
rows = [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
plpy._define_result('select \* from \(select \* from training\) a limit 1',rows)
plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
plpy._define_result('.*select \* from test.*' ,test_data)
model_parameters = {'n_estimators': 1200,
'max_depth': 3,
'subsample' : 0.5,
'learning_rate': 0.01,
'min_samples_leaf': 1}
result = segmentation.create_and_predict_segment(
'select * from training',
'target',
'select * from test',
model_parameters)
prediction = [r[1] for r in result]
accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
self.assertEqual(len(result),len(test_data))
self.assertTrue( result[0][2] < 0.01)
self.assertTrue( accuracy < 0.5*np.mean(test_y) )

View File

@@ -0,0 +1,324 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.space_time_dynamics as std
from crankshaft import random_seeds
import json
class SpaceTimeTests(unittest.TestCase):
"""Testing class for Markov Functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
self.markov_data = json.loads(open(fixture_file('markov.json')).read())
self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
self.transition_matrix = np.array([
[[ 0.96341463, 0.0304878 , 0.00609756, 0. , 0. ],
[ 0.06040268, 0.83221477, 0.10738255, 0. , 0. ],
[ 0. , 0.14 , 0.74 , 0.12 , 0. ],
[ 0. , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
[ 0. , 0. , 0. , 0.16666667, 0.83333333]],
[[ 0.79831933, 0.16806723, 0.03361345, 0. , 0. ],
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0. ],
[ 0. , 0. , 0.06372549, 0.90196078, 0.03431373],
[ 0. , 0. , 0. , 0.19444444, 0.80555556]],
[[ 0.84693878, 0.15306122, 0. , 0. , 0. ],
[ 0.08133971, 0.78947368, 0.1291866 , 0. , 0. ],
[ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0. , 0. , 0. , 0.10204082, 0.89795918]],
[[ 0.8852459 , 0.09836066, 0. , 0.01639344, 0. ],
[ 0.03875969, 0.81395349, 0.13953488, 0. , 0.00775194],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
[ 0. , 0. , 0. , 0.09661836, 0.90338164]],
[[ 0.33333333, 0.66666667, 0. , 0. , 0. ],
[ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0. ],
[ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0. ],
[ 0. , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]]]
)
def test_spatial_markov(self):
"""Test Spatial Markov."""
data = [ { 'id': d['id'],
'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'],
'neighbors': d['neighbors'] } for d in self.neighbors_data]
print(str(data[0]))
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
self.assertTrue(result != None)
result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
print result[0]
expected = self.markov_data
for ([res_trend, res_up, res_down, res_vol, res_id],
[exp_trend, exp_up, exp_down, exp_vol, exp_id]
) in zip(result, expected):
self.assertAlmostEqual(res_trend, exp_trend)
def test_get_time_data(self):
"""Test get_time_data"""
data = [ { 'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'] } for d in self.neighbors_data]
result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
## expected was prepared from PySAL example:
### f = ps.open(ps.examples.get_path("usjoin.csv"))
### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
### rpci = pci / (pci.mean(axis = 0))
expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154, 0.83271652
, 0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
, 0.86148267, 0.86252252, 0.86746356],
[ 0.9188951, 0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
, 0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
, 0.90139505, 0.87897455, 0.86216858],
[ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
, 0.80584442, 0.8084998, 0.82258551, 0.82668196, 0.82373724, 0.81814804
, 0.83675961, 0.83574199, 0.84647177],
[ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
, 1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
, 1.12958508, 1.11371818, 1.09936775],
[ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
, 1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
, 1.10863242, 1.09642516, 1.07762337],
[ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
, 1.44782832, 1.41978227, 1.39092208, 1.4059372, 1.40788646, 1.44052766
, 1.45241216, 1.43306098, 1.4174431 ],
[ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
, 1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
, 1.08401709, 1.06959206, 1.07875225],
[ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
, 0.9880258, 0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
, 1.01749645, 0.9936394, 0.98279746],
[ 0.98996986, 1.00143564, 0.99491, 1.00188408, 1.00455845, 0.99127006
, 0.97925917, 0.9683482, 0.95335147, 0.93694787, 0.94308213, 0.92232874
, 0.91284091, 0.89689833, 0.88928858],
[ 0.87418391, 0.86416601, 0.84425695, 0.8404494, 0.83903044, 0.8578708
, 0.86036185, 0.86107306, 0.8500772, 0.86981998, 0.86837929, 0.87204141
, 0.86633032, 0.84946077, 0.83287146],
[ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
, 1.12349752, 1.12596664, 1.12213996, 1.1119989, 1.10257792, 1.10491258
, 1.11059842, 1.10509795, 1.10020097],
[ 0.97282463, 0.96700147, 0.96252588, 0.9653878, 0.96057687, 0.95831051
, 0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
, 0.89280392, 0.89298243, 0.89165385],
[ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
, 0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
, 0.94281559, 0.96520315, 0.96715441],
[ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
, 0.99237486, 0.98209969, 0.9877653, 0.97399471, 0.96910087, 0.98416665
, 0.98423613, 0.99823861, 0.99545704],
[ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012, 0.86191535
, 0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
, 0.8198736, 0.82265395, 0.8399404 ],
[ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
, 0.86008789, 0.86776298, 0.86720209, 0.8676334, 0.89179317, 0.94202108
, 0.9422231, 0.93902708, 0.94479184],
[ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
, 0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
, 0.90543476, 0.91737138, 0.94793657],
[ 1.1977611, 1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
, 1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
, 1.23418676, 1.23626739, 1.26754398],
[ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
, 1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
, 1.32936244, 1.33001241, 1.33288462],
[ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
, 1.03405057, 1.02747623, 1.03162734, 0.9961416, 0.97356208, 0.94241549
, 0.92754547, 0.92549227, 0.92138102],
[ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
, 1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
, 1.10577769, 1.11166825, 1.09985155],
[ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
, 0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
, 0.77487118, 0.77768683, 0.78801192],
[ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
, 0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
, 0.93069645, 0.94020973, 0.94358232],
[ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
, 0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
, 0.87863799, 0.87497981, 0.87888675],
[ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
, 0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
, 0.99323123, 1.00204939, 0.99602148],
[ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
, 1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
, 1.08603695, 1.05997314, 1.02160943],
[ 1.11368269, 1.1057147, 1.11893431, 1.13778669, 1.1432272, 1.18257029
, 1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
, 1.12357937, 1.1159258, 1.12570828],
[ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
, 1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743, 1.28656675
, 1.29734097, 1.29390205, 1.29345446],
[ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
, 0.82848011, 0.8259679, 0.82435705, 0.83108634, 0.84373784, 0.83891093
, 0.84349247, 0.85637272, 0.86539395],
[ 1.23450087, 1.2426022, 1.23537935, 1.23581293, 1.24522626, 1.2256767
, 1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
, 1.27962009, 1.27968392, 1.25907738],
[ 0.9769662, 0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
, 0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
, 0.91475304, 0.90249622, 0.9021363 ],
[ 0.84986886, 0.8986851, 0.84295997, 0.87280534, 0.85659368, 0.88937573
, 0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
, 0.96635366, 1.02520312, 1.0394296 ],
[ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
, 0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168, 0.94779638
, 0.93759089, 0.93353431, 0.94121705],
[ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
, 0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
, 0.88283929, 0.90349491, 0.90600675],
[ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
, 0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
, 0.95011401, 0.9434686, 0.94637765],
[ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
, 1.03024941, 1.04200483, 1.0402554, 1.03296979, 1.02191682, 1.02476275
, 1.02347523, 1.02517684, 1.04359571],
[ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043, 1.0531801
, 1.07452771, 1.09383478, 1.1052447, 1.10322136, 1.09167939, 1.08772756
, 1.08859544, 1.09177338, 1.1096083 ],
[ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
, 0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
, 0.83374329, 0.82812044, 0.82878599],
[ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
, 0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497, 0.93869677
, 0.96947771, 1.00362626, 0.98102351],
[ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967, 0.93092109
, 0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
, 0.90430364, 0.89994694, 0.90073864],
[ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
, 0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
, 0.9700702, 0.97158948, 0.95884908],
[ 0.83980439, 0.84726737, 0.85747, 0.85467221, 0.8556751, 0.84818516
, 0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
, 0.83511679, 0.82136617, 0.80921874],
[ 0.95118156, 0.9466212, 0.94688098, 0.9508583, 0.9512441, 0.95440787
, 0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
, 0.97001634, 0.97082733, 0.98782366],
[ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
, 1.1214086, 1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
, 1.14474747, 1.14084037, 1.16142926],
[ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
, 1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
, 1.09831768, 1.09298232, 1.09176125],
[ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
, 0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
, 0.80172182, 0.82168155, 0.85587911],
[ 1.0052447, 1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
, 0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
, 0.96519561, 0.96173403, 0.95890284],
[ 0.95808419, 0.9382568, 0.9654441, 0.95561201, 0.96987289, 0.96608031
, 0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263, 1.1704974
, 1.17025154, 1.18730553, 1.14242645]])
self.assertTrue(np.allclose(result, expected))
self.assertTrue(type(result) == type(expected))
self.assertTrue(result.shape == expected.shape)
def test_rebin_data(self):
"""Test rebin_data"""
## sample in double the time (even case since 10 % 2 = 0):
## (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
## = 0.5, 2.5, 4.5, 6.5, 8.5
ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
for i in range(0, 10, 2)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
## sample in triple the time (uneven since 10 % 3 = 1):
## (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
## = 1, 4, 7, 9
ans_odd = np.array([i * np.ones(10, dtype=float)
for i in (1, 4, 7, 9)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
def test_get_prob_dist(self):
"""Test get_prob_dist"""
lag_indices = np.array([1, 2, 3, 4])
unit_indices = np.array([1, 3, 2, 4])
answer = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
self.assertTrue(np.array_equal(result, answer))
def test_get_prob_stats(self):
"""Test get_prob_stats"""
probs = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
unit_indices = np.array([1, 3, 2, 4])
answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824, 0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
answer_volatility = np.array([ 0.34221495, 0.33705421, 0.29226542, 0.38834223])
result = std.get_prob_stats(probs, unit_indices)
result_up = result[0]
result_down = result[1]
result_trend = result[2]
result_volatility = result[3]
self.assertTrue(np.allclose(result_up, answer_up))
self.assertTrue(np.allclose(result_down, answer_down))
self.assertTrue(np.allclose(result_trend, answer_trend))
self.assertTrue(np.allclose(result_volatility, answer_volatility))

View File

@@ -0,0 +1,5 @@
"""Import all modules"""
import crankshaft.random_seeds
import crankshaft.clustering
import crankshaft.space_time_dynamics
import crankshaft.segmentation

View File

@@ -0,0 +1,3 @@
"""Import all functions from for clustering"""
from moran import *
from kmeans import *

View File

@@ -0,0 +1,18 @@
from sklearn.cluster import KMeans
import plpy
def kmeans(query, no_clusters, no_init=20):
data = plpy.execute('''select array_agg(cartodb_id order by cartodb_id) as ids,
array_agg(ST_X(the_geom) order by cartodb_id) xs,
array_agg(ST_Y(the_geom) order by cartodb_id) ys from ({query}) a
where the_geom is not null
'''.format(query=query))
xs = data[0]['xs']
ys = data[0]['ys']
ids = data[0]['ids']
km = KMeans(n_clusters= no_clusters, n_init=no_init)
labels = km.fit_predict(zip(xs,ys))
return zip(ids,labels)

View File

@@ -0,0 +1,262 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import pysal as ps
import plpy
from collections import OrderedDict
# crankshaft module
import crankshaft.pysal_utils as pu
# High level interface ---------------------------------------
def moran(subquery, attr_name,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I (global)
Implementation building neighbors with a PostGIS database and Moran's I
core clusters with PySAL.
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", attr_name),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
attr_vals = pu.get_attributes(result)
## calculate weights
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global
moran_global = ps.esda.moran.Moran(attr_vals, weight,
permutations=permutations)
return zip([moran_global.I], [moran_global.EI])
def moran_local(subquery, attr,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("attr1", attr),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(5)
attr_vals = pu.get_attributes(result)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Rate (global)
Andy Eschbacher
"""
qvals = OrderedDict([("id_col", id_col),
("attr1", numerator),
("attr2", denominator)
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
plpy.notice('** Query: %s' % query)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(2)
plpy.notice('** Query returned with %d rows' % len(result))
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(2)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
## calculate moran global rate
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
permutations=permutations)
return zip([lisa_rate.I], [lisa_rate.EI])
def moran_local_rate(subquery, numerator, denominator,
w_type, num_ngbrs, permutations, geom_col, id_col):
"""
Moran's I Local Rate
Andy Eschbacher
"""
# geometries with values that are null are ignored
# resulting in a collection of not as near neighbors
qvals = OrderedDict([("id_col", id_col),
("numerator", numerator),
("denominator", denominator),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(5)
except plpy.SPIError:
plpy.error('Error: areas of interest query failed, check input parameters')
plpy.notice('** Query failed: "%s"' % query)
plpy.notice('** Error: %s' % plpy.SPIError)
return pu.empty_zipped_array(5)
## collect attributes
numer = pu.get_attributes(result, 1)
denom = pu.get_attributes(result, 2)
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
permutations=permutations)
# find quadrants for each geometry
quads = quad_position(lisa.q)
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
def moran_local_bv(subquery, attr1, attr2,
permutations, geom_col, id_col, w_type, num_ngbrs):
"""
Moran's I (local) Bivariate (untested)
"""
plpy.notice('** Constructing query')
qvals = OrderedDict([("id_col", id_col),
("attr1", attr1),
("attr2", attr2),
("geom_col", geom_col),
("subquery", subquery),
("num_ngbrs", num_ngbrs)])
query = pu.construct_neighbor_query(w_type, qvals)
try:
result = plpy.execute(query)
# if there are no neighbors, exit
if len(result) == 0:
return pu.empty_zipped_array(4)
except plpy.SPIError:
plpy.error("Error: areas of interest query failed, " \
"check input parameters")
plpy.notice('** Query failed: "%s"' % query)
return pu.empty_zipped_array(4)
## collect attributes
attr1_vals = pu.get_attributes(result, 1)
attr2_vals = pu.get_attributes(result, 2)
# create weights
weight = pu.get_weight(result, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
permutations=permutations)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = quad_position(lisa.q)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
@param coord (int): quadrant of a specific measurement
Output:
classification (one of 'HH', 'LH', 'LL', or 'HL')
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def quad_position(quads):
"""
Produce Moran's I classification based of n
Input:
@param quads ndarray: an array of quads classified by
1-4 (PySAL default)
Output:
@param list: an array of quads classied by 'HH', 'LL', etc.
"""
return [map_quads(q) for q in quads]

View File

@@ -0,0 +1,2 @@
"""Import all functions for pysal_utils"""
from crankshaft.pysal_utils.pysal_utils import *

View File

@@ -0,0 +1,188 @@
"""
Utilities module for generic PySAL functionality, mainly centered on
translating queries into numpy arrays or PySAL weights objects
"""
import numpy as np
import pysal as ps
def construct_neighbor_query(w_type, query_vals):
"""Return query (a string) used for finding neighbors
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
@param query_vals dict: values used to construct the query
"""
if w_type.lower() == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
## Build weight object
def get_weight(query_res, w_type='knn', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
@param query_res dict-like: query results with attributes and neighbors
"""
# if w_type.lower() == 'knn':
# row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
# weights = {x['id']: row_normed_weights for x in query_res}
# else:
# weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
# if len(x['neighbors']) > 0
# else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
print 'len of neighbors: %d' % len(neighbors)
built_weight = ps.W(neighbors)
built_weight.transform = 'r'
return built_weight
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
@param params: dict of information used in query (column names,
table name, etc.)
"""
attr_string = ""
template = "i.\"%(col)s\"::numeric As attr%(alias_num)s, "
if 'time_cols' in params:
## if markov analysis
attrs = params['time_cols']
for idx, val in enumerate(attrs):
attr_string += template % {"col": val, "alias_num": idx + 1}
else:
## if moran's analysis
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')]
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": params[val], "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Construct where conditions when building neighbors query
Create portion of WHERE clauses for weeding out NULL-valued geometries
Input: dict of params:
{'subquery': ...,
'numerator': 'data1',
'denominator': 'data2',
'': ...}
Output: 'idx_replace."data1" IS NOT NULL AND idx_replace."data2" IS NOT NULL'
Input:
{'subquery': ...,
'time_cols': ['time1', 'time2', 'time3'],
'etc': ...}
Output: 'idx_replace."time1" IS NOT NULL AND idx_replace."time2" IS NOT
NULL AND idx_replace."time3" IS NOT NULL'
"""
attr_string = []
template = "idx_replace.\"%s\" IS NOT NULL"
if 'time_cols' in params:
## markov where clauses
attrs = params['time_cols']
# add values to template
for attr in attrs:
attr_string.append(template % attr)
else:
## moran where clauses
# get keys
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs', 'subquery')])
# add values to template
for attr in attrs:
attr_string.append(template % params[attr])
if len(attrs) == 2:
attr_string.append("idx_replace.\"%s\" <> 0" % params[attrs[1]])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
@param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE " \
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"%(attr_where_j)s " \
"ORDER BY " \
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs})" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
@param params dict: information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM ({subquery}) As j " \
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM ({subquery}) As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_attributes(query_res, attr_num=1):
"""
@param query_res: query results with attributes and neighbors
@param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
def empty_zipped_array(num_nones):
"""
prepare return values for cases of empty weights objects (no neighbors)
Input:
@param num_nones int: number of columns (e.g., 4)
Output:
[(None, None, None, None)]
"""
return [tuple([None] * num_nones)]

View File

@@ -0,0 +1,11 @@
"""Random seed generator used for non-deterministic functions in crankshaft"""
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1 @@
from segmentation import *

View File

@@ -0,0 +1,176 @@
"""
Segmentation creation and prediction
"""
import sklearn
import numpy as np
import plpy
from sklearn.ensemble import GradientBoostingRegressor
from sklearn import metrics
from sklearn.cross_validation import train_test_split
# Lower level functions
#----------------------
def replace_nan_with_mean(array):
"""
Input:
@param array: an array of floats which may have null-valued entries
Output:
array with nans filled in with the mean of the dataset
"""
# returns an array of rows and column indices
indices = np.where(np.isnan(array))
# iterate through entries which have nan values
for row, col in zip(*indices):
array[row, col] = np.mean(array[~np.isnan(array[:, col]), col])
return array
def get_data(variable, feature_columns, query):
"""
Fetch data from the database, clean, and package into
numpy arrays
Input:
@param variable: name of the target variable
@param feature_columns: list of column names
@param query: subquery that data is pulled from for the packaging
Output:
prepared data, packaged into NumPy arrays
"""
columns = ','.join(['array_agg("{col}") As "{col}"'.format(col=col) for col in feature_columns])
try:
data = plpy.execute('''SELECT array_agg("{variable}") As target, {columns} FROM ({query}) As a'''.format(
variable=variable,
columns=columns,
query=query))
except Exception, e:
plpy.error('Failed to access data to build segmentation model: %s' % e)
# extract target data from plpy object
target = np.array(data[0]['target'])
# put n feature data arrays into an n x m array of arrays
features = np.column_stack([np.array(data[0][col], dtype=float) for col in feature_columns])
return replace_nan_with_mean(target), replace_nan_with_mean(features)
# High level interface
# --------------------
def create_and_predict_segment_agg(target, features, target_features, target_ids, model_parameters):
"""
Version of create_and_predict_segment that works on arrays that come stright form the SQL calling
the function.
Input:
@param target: The 1D array of lenth NSamples containing the target variable we want the model to predict
@param features: Thw 2D array of size NSamples * NFeatures that form the imput to the model
@param target_ids: A 1D array of target_ids that will be used to associate the results of the prediction with the rows which they come from
@param model_parameters: A dictionary containing parameters for the model.
"""
clean_target = replace_nan_with_mean(target)
clean_features = replace_nan_with_mean(features)
target_features = replace_nan_with_mean(target_features)
model, accuracy = train_model(clean_target, clean_features, model_parameters, 0.2)
prediction = model.predict(target_features)
accuracy_array = [accuracy]*prediction.shape[0]
return zip(target_ids, prediction, np.full(prediction.shape, accuracy_array))
def create_and_predict_segment(query, variable, target_query, model_params):
"""
generate a segment with machine learning
Stuart Lynn
"""
## fetch column names
try:
columns = plpy.execute('SELECT * FROM ({query}) As a LIMIT 1 '.format(query=query))[0].keys()
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
## extract column names to be used in building the segmentation model
feature_columns = set(columns) - set([variable, 'cartodb_id', 'the_geom', 'the_geom_webmercator'])
## get data from database
target, features = get_data(variable, feature_columns, query)
model, accuracy = train_model(target, features, model_params, 0.2)
cartodb_ids, result = predict_segment(model, feature_columns, target_query)
accuracy_array = [accuracy]*result.shape[0]
return zip(cartodb_ids, result, accuracy_array)
def train_model(target, features, model_params, test_split):
"""
Train the Gradient Boosting model on the provided data and calculate the accuracy of the model
Input:
@param target: 1D Array of the variable that the model is to be trianed to predict
@param features: 2D Array NSamples * NFeatures to use in trining the model
@param model_params: A dictionary of model parameters, the full specification can be found on the
scikit learn page for [GradientBoostingRegressor](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)
@parma test_split: The fraction of the data to be withheld for testing the model / calculating the accuray
"""
features_train, features_test, target_train, target_test = train_test_split(features, target, test_size=test_split)
model = GradientBoostingRegressor(**model_params)
model.fit(features_train, target_train)
accuracy = calculate_model_accuracy(model, features, target)
return model, accuracy
def calculate_model_accuracy(model, features, target):
"""
Calculate the mean squared error of the model prediction
Input:
@param model: model trained from input features
@param features: features to make a prediction from
@param target: target to compare prediction to
Output:
mean squared error of the model prection compared to the target
"""
prediction = model.predict(features)
return metrics.mean_squared_error(prediction, target)
def predict_segment(model, features, target_query):
"""
Use the provided model to predict the values for the new feature set
Input:
@param model: The pretrained model
@features: A list of features to use in the model prediction (list of column names)
@target_query: The query to run to obtain the data to predict on and the cartdb_ids associated with it.
"""
batch_size = 1000
joined_features = ','.join(['"{0}"::numeric'.format(a) for a in features])
try:
cursor = plpy.cursor('SELECT Array[{joined_features}] As features FROM ({target_query}) As a'.format(
joined_features=joined_features,
target_query=target_query))
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
results = []
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
batch = np.row_stack([np.array(row['features'], dtype=float) for row in rows])
#Need to fix this. Should be global mean. This will cause weird effects
batch = replace_nan_with_mean(batch)
prediction = model.predict(batch)
results.append(prediction)
try:
cartodb_ids = plpy.execute('''SELECT array_agg(cartodb_id ORDER BY cartodb_id) As cartodb_ids FROM ({0}) As a'''.format(target_query))[0]['cartodb_ids']
except Exception, e:
plpy.error('Failed to build segmentation model: %s' % e)
return cartodb_ids, np.concatenate(results)

View File

@@ -0,0 +1,2 @@
"""Import all functions from clustering libraries."""
from markov import *

View File

@@ -0,0 +1,189 @@
"""
Spatial dynamics measurements using Spatial Markov
"""
import numpy as np
import pysal as ps
import plpy
import crankshaft.pysal_utils as pu
def spatial_markov_trend(subquery, time_cols, num_classes=7,
w_type='knn', num_ngbrs=5, permutations=0,
geom_col='the_geom', id_col='cartodb_id'):
"""
Predict the trends of a unit based on:
1. history of its transitions to different classes (e.g., 1st quantile -> 2nd quantile)
2. average class of its neighbors
Inputs:
@param subquery string: e.g., SELECT the_geom, cartodb_id,
interesting_time_column FROM table_name
@param time_cols list of strings: list of strings of column names
@param num_classes (optional): number of classes to break distribution
of values into. Currently uses quantile bins.
@param w_type string (optional): weight type ('knn' or 'queen')
@param num_ngbrs int (optional): number of neighbors (if knn type)
@param permutations int (optional): number of permutations for test
stats
@param geom_col string (optional): name of column which contains the
geometries
@param id_col string (optional): name of column which has the ids of
the table
Outputs:
@param trend_up float: probablity that a geom will move to a higher
class
@param trend_down float: probablity that a geom will move to a lower
class
@param trend float: (trend_up - trend_down) / trend_static
@param volatility float: a measure of the volatility based on
probability stddev(prob array)
"""
if len(time_cols) < 2:
plpy.error('More than one time column needs to be passed')
qvals = {"id_col": id_col,
"time_cols": time_cols,
"geom_col": geom_col,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
try:
query_result = plpy.execute(
pu.construct_neighbor_query(w_type, qvals)
)
if len(query_result) == 0:
return zip([None], [None], [None], [None], [None])
except plpy.SPIError, err:
plpy.debug('Query failed with exception %s: %s' % (err, pu.construct_neighbor_query(w_type, qvals)))
plpy.error('Query failed, check the input parameters')
return zip([None], [None], [None], [None], [None])
## build weight
weights = pu.get_weight(query_result, w_type)
weights.transform = 'r'
## prep time data
t_data = get_time_data(query_result, time_cols)
plpy.debug('shape of t_data %d, %d' % t_data.shape)
plpy.debug('number of weight objects: %d, %d' % (weights.sparse).shape)
plpy.debug('first num elements: %f' % t_data[0, 0])
sp_markov_result = ps.Spatial_Markov(t_data,
weights,
k=num_classes,
fixed=False,
permutations=permutations)
## get lag classes
lag_classes = ps.Quantiles(
ps.lag_spatial(weights, t_data[:, -1]),
k=num_classes).yb
## look up probablity distribution for each unit according to class and lag class
prob_dist = get_prob_dist(sp_markov_result.P,
lag_classes,
sp_markov_result.classes[:, -1])
## find the ups and down and overall distribution of each cell
trend_up, trend_down, trend, volatility = get_prob_stats(prob_dist,
sp_markov_result.classes[:, -1])
## output the results
return zip(trend, trend_up, trend_down, volatility, weights.id_order)
def get_time_data(markov_data, time_cols):
"""
Extract the time columns and bin appropriately
"""
num_attrs = len(time_cols)
return np.array([[x['attr' + str(i)] for x in markov_data]
for i in range(1, num_attrs+1)], dtype=float).transpose()
## not currently used
def rebin_data(time_data, num_time_per_bin):
"""
Convert an n x l matrix into an (n/m) x l matrix where the values are
reduced (averaged) for the intervening states:
1 2 3 4 1.5 3.5
5 6 7 8 -> 5.5 7.5
9 8 7 6 8.5 6.5
5 4 3 2 4.5 2.5
if m = 2, the 4 x 4 matrix is transformed to a 2 x 4 matrix.
This process effectively resamples the data at a longer time span n
units longer than the input data.
For cases when there is a remainder (remainder(5/3) = 2), the remaining
two columns are binned together as the last time period, while the
first three are binned together for the first period.
Input:
@param time_data n x l ndarray: measurements of an attribute at
different time intervals
@param num_time_per_bin int: number of columns to average into a new
column
Output:
ceil(n / m) x l ndarray of resampled time series
"""
if time_data.shape[1] % num_time_per_bin == 0:
## if fit is perfect, then use it
n_max = time_data.shape[1] / num_time_per_bin
else:
## fit remainders into an additional column
n_max = time_data.shape[1] / num_time_per_bin + 1
return np.array([time_data[:, num_time_per_bin * i:num_time_per_bin * (i+1)].mean(axis=1)
for i in range(n_max)]).T
def get_prob_dist(transition_matrix, lag_indices, unit_indices):
"""
Given an array of transition matrices, look up the probability
associated with the arrangements passed
Input:
@param transition_matrix ndarray[k,k,k]:
@param lag_indices ndarray:
@param unit_indices ndarray:
Output:
Array of probability distributions
"""
return np.array([transition_matrix[(lag_indices[i], unit_indices[i])]
for i in range(len(lag_indices))])
def get_prob_stats(prob_dist, unit_indices):
"""
get the statistics of the probability distributions
Outputs:
@param trend_up ndarray(float): sum of probabilities for upward
movement (relative to the unit index of that prob)
@param trend_down ndarray(float): sum of probabilities for downward
movement (relative to the unit index of that prob)
@param trend ndarray(float): difference of upward and downward
movements
"""
num_elements = len(unit_indices)
trend_up = np.empty(num_elements, dtype=float)
trend_down = np.empty(num_elements, dtype=float)
trend = np.empty(num_elements, dtype=float)
for i in range(num_elements):
trend_up[i] = prob_dist[i, (unit_indices[i]+1):].sum()
trend_down[i] = prob_dist[i, :unit_indices[i]].sum()
if prob_dist[i, unit_indices[i]] > 0.0:
trend[i] = (trend_up[i] - trend_down[i]) / prob_dist[i, unit_indices[i]]
else:
trend[i] = None
## calculate volatility of distribution
volatility = prob_dist.std(axis=1)
return trend_up, trend_down, trend, volatility

View File

@@ -0,0 +1,49 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.0',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
# IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1,49 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.0',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
# IMPORTANT NOTE: please don't change this line. Instead issue a ticket to systems for evaluation.
install_requires=['joblib==0.8.3', 'numpy==1.6.1', 'scipy==0.14.0', 'pysal==1.11.2', 'scikit-learn==0.14.1'],
requires=['pysal', 'numpy', 'sklearn'],
test_suite='test'
)

View File

@@ -0,0 +1 @@
[{"xs": [9.917239463463458, 9.042767302696836, 10.798929825304187, 8.763751051762995, 11.383882954810852, 11.018206993460897, 8.939526075734316, 9.636159342565252, 10.136336896960058, 11.480610059427342, 12.115011910725082, 9.173267848893428, 10.239300931201738, 8.00012512174072, 8.979962292282131, 9.318376124429575, 10.82259513754284, 10.391747171927115, 10.04904588886165, 9.96007160443463, -0.78825626804569, -0.3511819898577426, -1.2796410003764271, -0.3977049391203402, 2.4792311265774667, 1.3670311632092624, 1.2963504112955613, 2.0404844103073025, -1.6439708506073223, 0.39122885445645805, 1.026031821452462, -0.04044477160482201, -0.7442346929085072, -0.34687120826243034, -0.23420359971379054, -0.5919629143336708, -0.202903054395391, -0.1893399644841902, 1.9331834251176807, -0.12321054392851609], "ys": [8.735627063679981, 9.857615954045011, 10.81439096759407, 10.586727233537191, 9.232919976568622, 11.54281262696508, 8.392787912674466, 9.355119689665944, 9.22380703532752, 10.542142541823122, 10.111980619367035, 10.760836265570738, 8.819773453269804, 10.25325722424816, 9.802077905695608, 8.955420161552611, 9.833801181904477, 10.491684241001613, 12.076108669877556, 11.74289693140474, -0.5685725015474191, -0.5715728344759778, -0.20180907868635137, 0.38431336480089595, -0.3402202083684184, -2.4652736827783586, 0.08295159401756182, 0.8503818775816505, 0.6488691600321166, 0.5794762568230527, -0.6770063922144103, -0.6557616416449478, -1.2834289177624947, 0.1096318195532717, -0.38986922166834853, -1.6224497706950238, 0.09429787743230483, 0.4005097316394031, -0.508002811195673, -1.2473463371366507], "ids": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]}]

View File

@@ -0,0 +1 @@
[[0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 0], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 1], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 2], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 3], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 4], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 5], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 6], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 7], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 8], [0.19047619047619049, 0.16, 0.0, 0.32594478059941379, 9], [-0.23529411764705882, 0.0, 0.19047619047619047, 0.31356338348865387, 10], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 11], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 12], [0.027777777777777783, 0.11111111111111112, 0.088888888888888892, 0.30339641183779581, 13], [0.03125, 0.030303030303030304, 0.0, 0.3850273981640871, 14], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 15], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 16], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 17], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 18], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 19], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 20], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 21], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 22], [-0.16666666666666663, 0.18181818181818182, 0.27272727272727271, 0.20246415864836445, 23], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 24], [0.1875, 0.23999999999999999, 0.12, 0.23731835158706122, 25], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 26], [-0.043478260869565216, 0.0, 0.041666666666666664, 0.37950991789118999, 27], [0.22222222222222221, 0.18181818181818182, 0.0, 0.31701083225750354, 28], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 29], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 30], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 31], [0.030303030303030304, 0.078947368421052627, 0.052631578947368418, 0.33560628561957595, 32], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 33], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 34], [0.0, 0.10000000000000001, 0.10000000000000001, 0.30331501776206204, 35], [-0.054054054054054057, 0.0, 0.05128205128205128, 0.37488547451276033, 36], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 37], [-0.22222222222222224, 0.13333333333333333, 0.26666666666666666, 0.22310934040908681, 38], [-0.0625, 0.095238095238095233, 0.14285714285714285, 0.28634850244519822, 39], [0.034482758620689655, 0.0625, 0.03125, 0.35388469167230169, 40], [0.11111111111111112, 0.10000000000000001, 0.0, 0.35213633723318016, 41], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 42], [0.0, 0.0, 0.0, 0.40000000000000002, 43], [0.0, 0.065217391304347824, 0.065217391304347824, 0.33605067580764519, 44], [0.078947368421052641, 0.073170731707317083, 0.0, 0.36451788667842738, 45], [0.052631578947368425, 0.090909090909090912, 0.045454545454545456, 0.33352611505171165, 46], [-0.20512820512820512, 0.0, 0.1702127659574468, 0.32172013908826891, 47]]

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "LL"],
[0.6152779669180425, "LL"],
[-0.14657336660125297, "LH"],
[0.6967858120189607, "LL"],
[0.07949310115714454, "HH"],
[0.4703198759258987, "HH"],
[0.4421125200498064, "HH"],
[0.5724288737143592, "LL"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "LL"],
[-0.01466729201304962, "HL"],
[0.3481559372544409, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "HH"],
[0.15971286468915544, "LL"],
[1.0543588860308968, "HH"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "HH"],
[0.08438455015300014, "LL"],
[0.06547094736902978, "LL"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "LL"],
[0.795275137550483, "HH"],
[0.18562939195219, "LL"],
[0.3010757406693439, "LL"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "LL"],
[-0.07116352791516614, "HL"],
[-0.09945240794119009, "LH"],
[0.18562939195219, "LL"],
[0.1832733440191868, "LL"],
[-0.39054253768447705, "HL"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "HH"],
[0.2584386102554792, "HH"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "LH"],
[0.051367269430983485, "LL"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "LL"],
[0.04191046803899837, "LL"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "LH"],
[0.5410013139159929, "HH"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

File diff suppressed because one or more lines are too long

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,52 @@
import re
class MockCursor:
def __init__(self, data):
self.cursor_pos = 0
self.data = data
def fetch(self, batch_size):
batch = self.data[self.cursor_pos : self.cursor_pos + batch_size]
self.cursor_pos += batch_size
return batch
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def debug(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def cursor(self, query):
data = self.execute(query)
return MockCursor(data)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,38 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import numpy as np
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class KMeansTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.cluster_data = json.loads(open(fixture_file('kmeans.json')).read())
self.params = {"subquery": "select * from table",
"no_clusters": "10"
}
def test_kmeans(self):
data = self.cluster_data
plpy._define_result('select' ,data)
clusters = cc.kmeans('subquery', 2)
labels = [a[1] for a in clusters]
c1 = [a for a in clusters if a[1]==0]
c2 = [a for a in clusters if a[1]==1]
self.assertEqual(len(np.unique(labels)),2)
self.assertEqual(len(c1),20)
self.assertEqual(len(c2),20)

View File

@@ -0,0 +1,88 @@
import unittest
import numpy as np
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_markov = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads"""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_quad_position(self):
"""Test lisa_sig_vals"""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('subquery', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None? ', result == None
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
def test_moran(self):
"""Test Moran's I global"""
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1235)
result = cc.moran('table', 'value', 'knn', 5, 99, 'the_geom', 'cartodb_id')
print 'result == None?', result == None
result_moran = result[0][0]
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)

View File

@@ -0,0 +1,142 @@
import unittest
import crankshaft.pysal_utils as pu
from crankshaft import random_seeds
class PysalUtilsTest(unittest.TestCase):
"""Testing class for utility functions related to PySAL integrations"""
def setUp(self):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.params_array = {"id_col": "cartodb_id",
"time_cols": ["_2013_dec", "_2014_jan", "_2014_feb"],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
def test_query_attr_select(self):
"""Test query_attr_select"""
ans = "i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, "
ans_array = "i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, "
self.assertEqual(pu.query_attr_select(self.params), ans)
self.assertEqual(pu.query_attr_select(self.params_array), ans_array)
def test_query_attr_where(self):
"""Test pu.query_attr_where"""
ans = "idx_replace.\"andy\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" IS NOT NULL AND " \
"idx_replace.\"jay_z\" <> 0"
ans_array = "idx_replace.\"_2013_dec\" IS NOT NULL AND " \
"idx_replace.\"_2014_jan\" IS NOT NULL AND " \
"idx_replace.\"_2014_feb\" IS NOT NULL"
self.assertEqual(pu.query_attr_where(self.params), ans)
self.assertEqual(pu.query_attr_where(self.params_array), ans_array)
def test_knn(self):
"""Test knn neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0 " \
"ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
ans_array = "SELECT i.\"cartodb_id\" As id, " \
"i.\"_2013_dec\"::numeric As attr1, " \
"i.\"_2014_jan\"::numeric As attr2, " \
"i.\"_2014_feb\"::numeric As attr3, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"j.\"_2013_dec\" IS NOT NULL AND " \
"j.\"_2014_jan\" IS NOT NULL AND " \
"j.\"_2014_feb\" IS NOT NULL " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC " \
"LIMIT 321)) As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"_2013_dec\" IS NOT NULL AND " \
"i.\"_2014_jan\" IS NOT NULL AND " \
"i.\"_2014_feb\" IS NOT NULL "\
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.knn(self.params), ans)
self.assertEqual(pu.knn(self.params_array), ans_array)
def test_queen(self):
"""Test queen neighbors constructor"""
ans = "SELECT i.\"cartodb_id\" As id, " \
"i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, " \
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM (SELECT * FROM a_list) As j " \
"WHERE " \
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
"ST_Touches(i.\"the_geom\", " \
"j.\"the_geom\") AND " \
"j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND " \
"j.\"jay_z\" <> 0)" \
") As neighbors " \
"FROM (SELECT * FROM a_list) As i " \
"WHERE i.\"andy\" IS NOT NULL AND " \
"i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 " \
"ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(pu.queen(self.params), ans)
def test_construct_neighbor_query(self):
"""Test construct_neighbor_query"""
# Compare to raw knn query
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
pu.knn(self.params))
def test_get_attributes(self):
"""Test get_attributes"""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight"""
self.assertEqual(True, True)
def test_empty_zipped_array(self):
"""Test empty_zipped_array"""
ans2 = [(None, None)]
ans4 = [(None, None, None, None)]
self.assertEqual(pu.empty_zipped_array(2), ans2)
self.assertEqual(pu.empty_zipped_array(4), ans4)

View File

@@ -0,0 +1,64 @@
import unittest
import numpy as np
from helper import plpy, fixture_file
import crankshaft.segmentation as segmentation
import json
class SegmentationTest(unittest.TestCase):
"""Testing class for Moran's I functions"""
def setUp(self):
plpy._reset()
def generate_random_data(self,n_samples,random_state, row_type=False):
x1 = random_state.uniform(size=n_samples)
x2 = random_state.uniform(size=n_samples)
x3 = random_state.randint(0, 4, size=n_samples)
y = x1+x2*x2+x3
cartodb_id = range(len(x1))
if row_type:
return [ {'features': vals} for vals in zip(x1,x2,x3)], y
else:
return [dict( zip(['x1','x2','x3','target', 'cartodb_id'],[x1,x2,x3,y,cartodb_id]))]
def test_replace_nan_with_mean(self):
test_array = np.array([1.2, np.nan, 3.2, np.nan, np.nan])
def test_create_and_predict_segment(self):
n_samples = 1000
random_state_train = np.random.RandomState(13)
random_state_test = np.random.RandomState(134)
training_data = self.generate_random_data(n_samples, random_state_train)
test_data, test_y = self.generate_random_data(n_samples, random_state_test, row_type=True)
ids = [{'cartodb_ids': range(len(test_data))}]
rows = [{'x1': 0,'x2':0,'x3':0,'y':0,'cartodb_id':0}]
plpy._define_result('select \* from \(select \* from training\) a limit 1',rows)
plpy._define_result('.*from \(select \* from training\) as a' ,training_data)
plpy._define_result('select array_agg\(cartodb\_id order by cartodb\_id\) as cartodb_ids from \(.*\) a',ids)
plpy._define_result('.*select \* from test.*' ,test_data)
model_parameters = {'n_estimators': 1200,
'max_depth': 3,
'subsample' : 0.5,
'learning_rate': 0.01,
'min_samples_leaf': 1}
result = segmentation.create_and_predict_segment(
'select * from training',
'target',
'select * from test',
model_parameters)
prediction = [r[1] for r in result]
accuracy =np.sqrt(np.mean( np.square( np.array(prediction) - np.array(test_y))))
self.assertEqual(len(result),len(test_data))
self.assertTrue( result[0][2] < 0.01)
self.assertTrue( accuracy < 0.5*np.mean(test_y) )

View File

@@ -0,0 +1,324 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.space_time_dynamics as std
from crankshaft import random_seeds
import json
class SpaceTimeTests(unittest.TestCase):
"""Testing class for Markov Functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"time_cols": ['dec_2013', 'jan_2014', 'feb_2014'],
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors_markov.json')).read())
self.markov_data = json.loads(open(fixture_file('markov.json')).read())
self.time_data = np.array([i * np.ones(10, dtype=float) for i in range(10)]).T
self.transition_matrix = np.array([
[[ 0.96341463, 0.0304878 , 0.00609756, 0. , 0. ],
[ 0.06040268, 0.83221477, 0.10738255, 0. , 0. ],
[ 0. , 0.14 , 0.74 , 0.12 , 0. ],
[ 0. , 0.03571429, 0.32142857, 0.57142857, 0.07142857],
[ 0. , 0. , 0. , 0.16666667, 0.83333333]],
[[ 0.79831933, 0.16806723, 0.03361345, 0. , 0. ],
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0.00537634, 0.06989247, 0.8655914 , 0.05913978, 0. ],
[ 0. , 0. , 0.06372549, 0.90196078, 0.03431373],
[ 0. , 0. , 0. , 0.19444444, 0.80555556]],
[[ 0.84693878, 0.15306122, 0. , 0. , 0. ],
[ 0.08133971, 0.78947368, 0.1291866 , 0. , 0. ],
[ 0.00518135, 0.0984456 , 0.79274611, 0.0984456 , 0.00518135],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0. , 0. , 0. , 0.10204082, 0.89795918]],
[[ 0.8852459 , 0.09836066, 0. , 0.01639344, 0. ],
[ 0.03875969, 0.81395349, 0.13953488, 0. , 0.00775194],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0.02339181, 0.12865497, 0.75438596, 0.09356725],
[ 0. , 0. , 0. , 0.09661836, 0.90338164]],
[[ 0.33333333, 0.66666667, 0. , 0. , 0. ],
[ 0.0483871 , 0.77419355, 0.16129032, 0.01612903, 0. ],
[ 0.01149425, 0.16091954, 0.74712644, 0.08045977, 0. ],
[ 0. , 0.01036269, 0.06217617, 0.89637306, 0.03108808],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]]]
)
def test_spatial_markov(self):
"""Test Spatial Markov."""
data = [ { 'id': d['id'],
'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'],
'neighbors': d['neighbors'] } for d in self.neighbors_data]
print(str(data[0]))
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = std.spatial_markov_trend('subquery', ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'], 5, 'knn', 5, 0, 'the_geom', 'cartodb_id')
self.assertTrue(result != None)
result = [(row[0], row[1], row[2], row[3], row[4]) for row in result]
print result[0]
expected = self.markov_data
for ([res_trend, res_up, res_down, res_vol, res_id],
[exp_trend, exp_up, exp_down, exp_vol, exp_id]
) in zip(result, expected):
self.assertAlmostEqual(res_trend, exp_trend)
def test_get_time_data(self):
"""Test get_time_data"""
data = [ { 'attr1': d['y1995'],
'attr2': d['y1996'],
'attr3': d['y1997'],
'attr4': d['y1998'],
'attr5': d['y1999'],
'attr6': d['y2000'],
'attr7': d['y2001'],
'attr8': d['y2002'],
'attr9': d['y2003'],
'attr10': d['y2004'],
'attr11': d['y2005'],
'attr12': d['y2006'],
'attr13': d['y2007'],
'attr14': d['y2008'],
'attr15': d['y2009'] } for d in self.neighbors_data]
result = std.get_time_data(data, ['y1995', 'y1996', 'y1997', 'y1998', 'y1999', 'y2000', 'y2001', 'y2002', 'y2003', 'y2004', 'y2005', 'y2006', 'y2007', 'y2008', 'y2009'])
## expected was prepared from PySAL example:
### f = ps.open(ps.examples.get_path("usjoin.csv"))
### pci = np.array([f.by_col[str(y)] for y in range(1995, 2010)]).transpose()
### rpci = pci / (pci.mean(axis = 0))
expected = np.array([[ 0.87654416, 0.863147, 0.85637567, 0.84811668, 0.8446154, 0.83271652
, 0.83786314, 0.85012593, 0.85509656, 0.86416612, 0.87119375, 0.86302631
, 0.86148267, 0.86252252, 0.86746356],
[ 0.9188951, 0.91757931, 0.92333258, 0.92517289, 0.92552388, 0.90746978
, 0.89830489, 0.89431991, 0.88924794, 0.89815176, 0.91832091, 0.91706054
, 0.90139505, 0.87897455, 0.86216858],
[ 0.82591007, 0.82548596, 0.81989793, 0.81503235, 0.81731522, 0.78964559
, 0.80584442, 0.8084998, 0.82258551, 0.82668196, 0.82373724, 0.81814804
, 0.83675961, 0.83574199, 0.84647177],
[ 1.09088176, 1.08537689, 1.08456418, 1.08415404, 1.09898841, 1.14506948
, 1.12151133, 1.11160697, 1.10888621, 1.11399806, 1.12168029, 1.13164797
, 1.12958508, 1.11371818, 1.09936775],
[ 1.10731446, 1.11373944, 1.13283638, 1.14472559, 1.15910025, 1.16898201
, 1.17212488, 1.14752303, 1.11843284, 1.11024964, 1.11943471, 1.11736468
, 1.10863242, 1.09642516, 1.07762337],
[ 1.42269757, 1.42118434, 1.44273502, 1.43577571, 1.44400684, 1.44184737
, 1.44782832, 1.41978227, 1.39092208, 1.4059372, 1.40788646, 1.44052766
, 1.45241216, 1.43306098, 1.4174431 ],
[ 1.13073885, 1.13110513, 1.11074708, 1.13364636, 1.13088149, 1.10888138
, 1.11856629, 1.13062931, 1.11944984, 1.12446239, 1.11671008, 1.10880034
, 1.08401709, 1.06959206, 1.07875225],
[ 1.04706124, 1.04516831, 1.04253372, 1.03239987, 1.02072545, 0.99854316
, 0.9880258, 0.99669587, 0.99327676, 1.01400905, 1.03176742, 1.040511
, 1.01749645, 0.9936394, 0.98279746],
[ 0.98996986, 1.00143564, 0.99491, 1.00188408, 1.00455845, 0.99127006
, 0.97925917, 0.9683482, 0.95335147, 0.93694787, 0.94308213, 0.92232874
, 0.91284091, 0.89689833, 0.88928858],
[ 0.87418391, 0.86416601, 0.84425695, 0.8404494, 0.83903044, 0.8578708
, 0.86036185, 0.86107306, 0.8500772, 0.86981998, 0.86837929, 0.87204141
, 0.86633032, 0.84946077, 0.83287146],
[ 1.14196118, 1.14660262, 1.14892712, 1.14909594, 1.14436624, 1.14450183
, 1.12349752, 1.12596664, 1.12213996, 1.1119989, 1.10257792, 1.10491258
, 1.11059842, 1.10509795, 1.10020097],
[ 0.97282463, 0.96700147, 0.96252588, 0.9653878, 0.96057687, 0.95831051
, 0.94480909, 0.94804195, 0.95430286, 0.94103989, 0.92122519, 0.91010201
, 0.89280392, 0.89298243, 0.89165385],
[ 0.94325468, 0.96436902, 0.96455242, 0.95243009, 0.94117647, 0.9480927
, 0.93539182, 0.95388718, 0.94597005, 0.96918424, 0.94781281, 0.93466815
, 0.94281559, 0.96520315, 0.96715441],
[ 0.97478408, 0.98169225, 0.98712809, 0.98474769, 0.98559897, 0.98687073
, 0.99237486, 0.98209969, 0.9877653, 0.97399471, 0.96910087, 0.98416665
, 0.98423613, 0.99823861, 0.99545704],
[ 0.85570269, 0.85575915, 0.85986132, 0.85693406, 0.8538012, 0.86191535
, 0.84981451, 0.85472102, 0.84564835, 0.83998883, 0.83478547, 0.82803648
, 0.8198736, 0.82265395, 0.8399404 ],
[ 0.87022047, 0.85996258, 0.85961813, 0.85689572, 0.83947136, 0.82785597
, 0.86008789, 0.86776298, 0.86720209, 0.8676334, 0.89179317, 0.94202108
, 0.9422231, 0.93902708, 0.94479184],
[ 0.90134907, 0.90407738, 0.90403991, 0.90201769, 0.90399238, 0.90906632
, 0.92693339, 0.93695966, 0.94242697, 0.94338265, 0.91981796, 0.91108804
, 0.90543476, 0.91737138, 0.94793657],
[ 1.1977611, 1.18222564, 1.18439158, 1.18267865, 1.19286723, 1.20172869
, 1.21328691, 1.22624778, 1.22397075, 1.23857042, 1.24419893, 1.23929384
, 1.23418676, 1.23626739, 1.26754398],
[ 1.24919678, 1.25754773, 1.26991161, 1.28020651, 1.30625667, 1.34790023
, 1.34399863, 1.32575181, 1.30795492, 1.30544841, 1.30303302, 1.32107766
, 1.32936244, 1.33001241, 1.33288462],
[ 1.06768004, 1.03799276, 1.03637303, 1.02768449, 1.03296093, 1.05059016
, 1.03405057, 1.02747623, 1.03162734, 0.9961416, 0.97356208, 0.94241549
, 0.92754547, 0.92549227, 0.92138102],
[ 1.09475614, 1.11526796, 1.11654299, 1.13103948, 1.13143264, 1.13889622
, 1.12442212, 1.13367018, 1.13982256, 1.14029944, 1.11979401, 1.10905389
, 1.10577769, 1.11166825, 1.09985155],
[ 0.76530058, 0.76612841, 0.76542451, 0.76722683, 0.76014284, 0.74480073
, 0.76098396, 0.76156903, 0.76651952, 0.76533288, 0.78205934, 0.76842416
, 0.77487118, 0.77768683, 0.78801192],
[ 0.98391336, 0.98075816, 0.98295341, 0.97386015, 0.96913803, 0.97370819
, 0.96419154, 0.97209861, 0.97441313, 0.96356162, 0.94745352, 0.93965462
, 0.93069645, 0.94020973, 0.94358232],
[ 0.83561828, 0.82298088, 0.81738502, 0.81748588, 0.80904801, 0.80071489
, 0.83358256, 0.83451613, 0.85175032, 0.85954307, 0.86790024, 0.87170334
, 0.87863799, 0.87497981, 0.87888675],
[ 0.98845573, 1.02092428, 0.99665283, 0.99141823, 0.99386619, 0.98733195
, 0.99644997, 0.99669587, 1.02559097, 1.01116651, 0.99988024, 0.97906749
, 0.99323123, 1.00204939, 0.99602148],
[ 1.14930913, 1.15241949, 1.14300962, 1.14265542, 1.13984683, 1.08312397
, 1.05192626, 1.04230892, 1.05577278, 1.08569751, 1.12443486, 1.08891079
, 1.08603695, 1.05997314, 1.02160943],
[ 1.11368269, 1.1057147, 1.11893431, 1.13778669, 1.1432272, 1.18257029
, 1.16226243, 1.16009196, 1.14467789, 1.14820235, 1.12386598, 1.12680236
, 1.12357937, 1.1159258, 1.12570828],
[ 1.30379431, 1.30752186, 1.31206366, 1.31532267, 1.30625667, 1.31210239
, 1.29989156, 1.29203193, 1.27183516, 1.26830786, 1.2617743, 1.28656675
, 1.29734097, 1.29390205, 1.29345446],
[ 0.83953719, 0.82701448, 0.82006005, 0.81188876, 0.80294864, 0.78772975
, 0.82848011, 0.8259679, 0.82435705, 0.83108634, 0.84373784, 0.83891093
, 0.84349247, 0.85637272, 0.86539395],
[ 1.23450087, 1.2426022, 1.23537935, 1.23581293, 1.24522626, 1.2256767
, 1.21126648, 1.19377804, 1.18355337, 1.19674434, 1.21536573, 1.23653297
, 1.27962009, 1.27968392, 1.25907738],
[ 0.9769662, 0.97400719, 0.98035944, 0.97581531, 0.95543282, 0.96480308
, 0.94686376, 0.93679073, 0.92540049, 0.92988835, 0.93442917, 0.92100464
, 0.91475304, 0.90249622, 0.9021363 ],
[ 0.84986886, 0.8986851, 0.84295997, 0.87280534, 0.85659368, 0.88937573
, 0.894401, 0.90448993, 0.95495898, 0.92698333, 0.94745352, 0.92562488
, 0.96635366, 1.02520312, 1.0394296 ],
[ 1.01922808, 1.00258203, 1.00974428, 1.00303417, 0.99765073, 1.00759019
, 0.99192968, 0.99747298, 0.99550759, 0.97583768, 0.9610168, 0.94779638
, 0.93759089, 0.93353431, 0.94121705],
[ 0.86367411, 0.85558932, 0.85544346, 0.85103025, 0.84336613, 0.83434854
, 0.85813595, 0.84667961, 0.84374558, 0.85951183, 0.87194227, 0.89455097
, 0.88283929, 0.90349491, 0.90600675],
[ 1.00947534, 1.00411055, 1.00698819, 0.99513687, 0.99291086, 1.00581626
, 0.98850522, 0.99291168, 0.98983209, 0.97511924, 0.96134615, 0.96382634
, 0.95011401, 0.9434686, 0.94637765],
[ 1.05712571, 1.05459419, 1.05753012, 1.04880786, 1.05103857, 1.04800023
, 1.03024941, 1.04200483, 1.0402554, 1.03296979, 1.02191682, 1.02476275
, 1.02347523, 1.02517684, 1.04359571],
[ 1.07084189, 1.06669497, 1.07937623, 1.07387988, 1.0794043, 1.0531801
, 1.07452771, 1.09383478, 1.1052447, 1.10322136, 1.09167939, 1.08772756
, 1.08859544, 1.09177338, 1.1096083 ],
[ 0.86719222, 0.86628896, 0.86675156, 0.86425632, 0.86511809, 0.86287327
, 0.85169796, 0.85411285, 0.84886336, 0.84517414, 0.84843858, 0.84488343
, 0.83374329, 0.82812044, 0.82878599],
[ 0.88389211, 0.92288667, 0.90282398, 0.91229186, 0.92023286, 0.92652175
, 0.94278865, 0.93682452, 0.98655146, 0.992237, 0.9798497, 0.93869677
, 0.96947771, 1.00362626, 0.98102351],
[ 0.97082064, 0.95320233, 0.94534081, 0.94215593, 0.93967, 0.93092109
, 0.92662519, 0.93412152, 0.93501274, 0.92879506, 0.92110542, 0.91035556
, 0.90430364, 0.89994694, 0.90073864],
[ 0.95861858, 0.95774543, 0.98254811, 0.98919472, 0.98684824, 0.98882205
, 0.97662234, 0.95601578, 0.94905385, 0.94934888, 0.97152609, 0.97163004
, 0.9700702, 0.97158948, 0.95884908],
[ 0.83980439, 0.84726737, 0.85747, 0.85467221, 0.8556751, 0.84818516
, 0.85265681, 0.84502402, 0.82645665, 0.81743586, 0.83550406, 0.83338919
, 0.83511679, 0.82136617, 0.80921874],
[ 0.95118156, 0.9466212, 0.94688098, 0.9508583, 0.9512441, 0.95440787
, 0.96364363, 0.96804412, 0.97136214, 0.97583768, 0.95571724, 0.96895368
, 0.97001634, 0.97082733, 0.98782366],
[ 1.08910044, 1.08248968, 1.08492895, 1.08656923, 1.09454249, 1.10558188
, 1.1214086, 1.12292577, 1.13021031, 1.13342735, 1.14686068, 1.14502975
, 1.14474747, 1.14084037, 1.16142926],
[ 1.06336033, 1.07365823, 1.08691496, 1.09764846, 1.11669863, 1.11856702
, 1.09764283, 1.08815849, 1.08044313, 1.09278827, 1.07003204, 1.08398066
, 1.09831768, 1.09298232, 1.09176125],
[ 0.79772065, 0.78829196, 0.78581151, 0.77615922, 0.77035744, 0.77751194
, 0.79902974, 0.81437881, 0.80788828, 0.79603865, 0.78966436, 0.79949807
, 0.80172182, 0.82168155, 0.85587911],
[ 1.0052447, 1.00007696, 1.00475899, 1.00613942, 1.00639561, 1.00162979
, 0.99860739, 1.00814981, 1.00574316, 0.99030032, 0.97682565, 0.97292596
, 0.96519561, 0.96173403, 0.95890284],
[ 0.95808419, 0.9382568, 0.9654441, 0.95561201, 0.96987289, 0.96608031
, 0.99727185, 1.00781194, 1.03484236, 1.05333619, 1.0983263, 1.1704974
, 1.17025154, 1.18730553, 1.14242645]])
self.assertTrue(np.allclose(result, expected))
self.assertTrue(type(result) == type(expected))
self.assertTrue(result.shape == expected.shape)
def test_rebin_data(self):
"""Test rebin_data"""
## sample in double the time (even case since 10 % 2 = 0):
## (0+1)/2, (2+3)/2, (4+5)/2, (6+7)/2, (8+9)/2
## = 0.5, 2.5, 4.5, 6.5, 8.5
ans_even = np.array([(i + 0.5) * np.ones(10, dtype=float)
for i in range(0, 10, 2)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 2), ans_even))
## sample in triple the time (uneven since 10 % 3 = 1):
## (0+1+2)/3, (3+4+5)/3, (6+7+8)/3, (9)/1
## = 1, 4, 7, 9
ans_odd = np.array([i * np.ones(10, dtype=float)
for i in (1, 4, 7, 9)]).T
self.assertTrue(np.array_equal(std.rebin_data(self.time_data, 3), ans_odd))
def test_get_prob_dist(self):
"""Test get_prob_dist"""
lag_indices = np.array([1, 2, 3, 4])
unit_indices = np.array([1, 3, 2, 4])
answer = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
result = std.get_prob_dist(self.transition_matrix, lag_indices, unit_indices)
self.assertTrue(np.array_equal(result, answer))
def test_get_prob_stats(self):
"""Test get_prob_stats"""
probs = np.array([
[ 0.0754717 , 0.88207547, 0.04245283, 0. , 0. ],
[ 0. , 0. , 0.09411765, 0.87058824, 0.03529412],
[ 0.0049505 , 0.09405941, 0.77722772, 0.11881188, 0.0049505 ],
[ 0. , 0. , 0. , 0.02352941, 0.97647059]
])
unit_indices = np.array([1, 3, 2, 4])
answer_up = np.array([0.04245283, 0.03529412, 0.12376238, 0.])
answer_down = np.array([0.0754717, 0.09411765, 0.0990099, 0.02352941])
answer_trend = np.array([-0.03301887 / 0.88207547, -0.05882353 / 0.87058824, 0.02475248 / 0.77722772, -0.02352941 / 0.97647059])
answer_volatility = np.array([ 0.34221495, 0.33705421, 0.29226542, 0.38834223])
result = std.get_prob_stats(probs, unit_indices)
result_up = result[0]
result_down = result[1]
result_trend = result[2]
result_volatility = result[3]
self.assertTrue(np.allclose(result_up, answer_up))
self.assertTrue(np.allclose(result_down, answer_down))
self.assertTrue(np.allclose(result_trend, answer_trend))
self.assertTrue(np.allclose(result_volatility, answer_volatility))

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.1.0'
default_version = '0.3.1'
requires = 'plpythonu, postgis'
superuser = true
schema = cdb_crankshaft

View File

@@ -74,11 +74,11 @@ BEGIN
SELECT array_agg(v) INTO vertex FROM a;
-- retrieve the value of each vertex
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
SELECT c INTO va FROM a WHERE ST_Equals(geo, vertex[1]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
SELECT c INTO vb FROM a WHERE ST_Equals(geo, vertex[2]);
WITH a AS(SELECT unnest(vertex) as geo, unnest(colin) as c)
WITH a AS(SELECT unnest(geomin) as geo, unnest(colin) as c)
SELECT c INTO vc FROM a WHERE ST_Equals(geo, vertex[3]);
SELECT ST_area(g), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[2], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point, vertex[1], vertex[3], point]))), ST_area(ST_MakePolygon(ST_MakeLine(ARRAY[point,vertex[1],vertex[2], point]))) INTO sg, sa, sb, sc;

236
src/pg/sql/09_voronoi.sql Normal file
View File

@@ -0,0 +1,236 @@
-- =============================================================================================
--
-- CDB_Voronoi
--
-- =============================================================================================
CREATE OR REPLACE FUNCTION CDB_voronoi(
IN geomin geometry[],
IN buffer numeric DEFAULT 0.5,
IN tolerance numeric DEFAULT 1e-9
)
RETURNS geometry AS $$
DECLARE
geomout geometry;
BEGIN
-- we need to make the geometry calculations in (pseudo)meters!!!
with a as (
SELECT unnest(geomin) as g1
),
b as(
SELECT st_transform(g1, 3857) g2 from a
)
SELECT array_agg(g2) INTO geomin from b;
WITH
convexhull_1 as (
SELECT
ST_ConvexHull(ST_Collect(geomin)) as g,
buffer * |/ (st_area(ST_ConvexHull(ST_Collect(geomin)))/PI()) as r
),
clipper as(
SELECT
st_buffer(ST_MinimumBoundingCircle(a.g), buffer*a.r) as g
FROM convexhull_1 a
),
env0 as (
SELECT
(st_dumppoints(st_expand(a.g, buffer*a.r))).geom as e
FROM convexhull_1 a
),
env as (
SELECT
array_agg(env0.e) as e
FROM env0
),
sample AS (
SELECT
ST_Collect(geomin || env.e) as geom
FROM env
),
convexhull as (
SELECT
ST_ConvexHull(ST_Collect(geomin)) as cg
),
tin as (
SELECT
ST_Dump(ST_DelaunayTriangles(geom, tolerance, 0)) as gd
FROM
sample
),
tin_polygons as (
SELECT
(gd).Path as id,
(gd).Geom as pg,
ST_Centroid(ST_MinimumBoundingCircle((gd).Geom, 180)) as ct
FROM tin
),
tin_lines as (
SELECT
id,
ST_ExteriorRing(pg) as lg
FROM tin_polygons
),
tin_nodes as (
SELECT
id,
ST_PointN(lg,1) p1,
ST_PointN(lg,2) p2,
ST_PointN(lg,3) p3
FROM tin_lines
),
tin_edges AS (
SELECT
p.id,
UNNEST(ARRAY[
ST_MakeLine(n.p1,n.p2) ,
ST_MakeLine(n.p2,n.p3) ,
ST_MakeLine(n.p3,n.p1)]) as Edge,
ST_Force2D(cdb_crankshaft._Find_Circle(n.p1,n.p2,n.p3)) as ct,
CASE WHEN st_distance(p.ct, ST_ExteriorRing(p.pg)) < tolerance THEN
TRUE
ELSE FALSE END AS ctx,
p.pg,
ST_within(p.ct, convexhull.cg) as ctin
FROM
tin_polygons p,
tin_nodes n,
convexhull
WHERE p.id = n.id
),
voro_nodes as (
SELECT
CASE WHEN x.ctx = TRUE THEN
ST_Centroid(x.edge)
ELSE
x.ct
END as xct,
CASE WHEN y.id is null THEN
CASE WHEN x.ctin = TRUE THEN
ST_SetSRID(ST_MakePoint(
ST_X(x.ct) + ((ST_X(ST_Centroid(x.edge)) - ST_X(x.ct)) * (1+buffer)),
ST_Y(x.ct) + ((ST_Y(ST_Centroid(x.edge)) - ST_Y(x.ct)) * (1+buffer))
), ST_SRID(x.ct))
END
ELSE
y.ct
END as yct
FROM
tin_edges x
LEFT OUTER JOIN
tin_edges y
ON x.id <> y.id AND ST_Equals(x.edge, y.edge)
),
voro_edges as(
SELECT
ST_LineMerge(ST_Collect(ST_MakeLine(xct, yct))) as v
FROM
voro_nodes
),
voro_cells as(
SELECT
ST_Polygonize(
ST_Node(
ST_LineMerge(
ST_Union(v, ST_ExteriorRing(
ST_Convexhull(v)
)
)
)
)
) as g
FROM
voro_edges
),
voro_set as(
SELECT
(st_dump(v.g)).geom as g
FROM voro_cells v
),
clipped_voro as(
SELECT
ST_intersection(c.g, v.g) as g
FROM
voro_set v,
clipper c
WHERE
ST_GeometryType(v.g) = 'ST_Polygon'
)
SELECT
st_collect(
ST_Transform(
ST_ConvexHull(g),
4326
)
)
INTO geomout
FROM
clipped_voro;
RETURN geomout;
END;
$$ language plpgsql IMMUTABLE;
/** ----------------------------------------------------------------------------------------
* @function : FindCircle
* @precis : Function that determines if three points form a circle. If so a table containing
* centre and radius is returned. If not, a null table is returned.
* @version : 1.0.1
* @param : p_pt1 : First point in curve
* @param : p_pt2 : Second point in curve
* @param : p_pt3 : Third point in curve
* @return : geometry : In which X,Y ordinates are the centre X, Y and the Z being the radius of found circle
* or NULL if three points do not form a circle.
* @history : Simon Greener - Feb 2012 - Original coding.
* Rafa de la Torre - Aug 2016 - Small fix for type checking
* @copyright : Simon Greener @ 2012
* Licensed under a Creative Commons Attribution-Share Alike 2.5 Australia License. (http://creativecommons.org/licenses/by-sa/2.5/au/)
**/
CREATE OR REPLACE FUNCTION _Find_Circle(
IN p_pt1 geometry,
IN p_pt2 geometry,
IN p_pt3 geometry)
RETURNS geometry AS
$BODY$
DECLARE
v_Centre geometry;
v_radius NUMERIC;
v_CX NUMERIC;
v_CY NUMERIC;
v_dA NUMERIC;
v_dB NUMERIC;
v_dC NUMERIC;
v_dD NUMERIC;
v_dE NUMERIC;
v_dF NUMERIC;
v_dG NUMERIC;
BEGIN
IF ( p_pt1 IS NULL OR p_pt2 IS NULL OR p_pt3 IS NULL ) THEN
RAISE EXCEPTION 'All supplied points must be not null.';
RETURN NULL;
END IF;
IF ( ST_GeometryType(p_pt1) <> 'ST_Point' OR
ST_GeometryType(p_pt2) <> 'ST_Point' OR
ST_GeometryType(p_pt3) <> 'ST_Point' ) THEN
RAISE EXCEPTION 'All supplied geometries must be points.';
RETURN NULL;
END IF;
v_dA := ST_X(p_pt2) - ST_X(p_pt1);
v_dB := ST_Y(p_pt2) - ST_Y(p_pt1);
v_dC := ST_X(p_pt3) - ST_X(p_pt1);
v_dD := ST_Y(p_pt3) - ST_Y(p_pt1);
v_dE := v_dA * (ST_X(p_pt1) + ST_X(p_pt2)) + v_dB * (ST_Y(p_pt1) + ST_Y(p_pt2));
v_dF := v_dC * (ST_X(p_pt1) + ST_X(p_pt3)) + v_dD * (ST_Y(p_pt1) + ST_Y(p_pt3));
v_dG := 2.0 * (v_dA * (ST_Y(p_pt3) - ST_Y(p_pt2)) - v_dB * (ST_X(p_pt3) - ST_X(p_pt2)));
-- If v_dG is zero then the three points are collinear and no finite-radius
-- circle through them exists.
IF ( v_dG = 0 ) THEN
RETURN NULL;
ELSE
v_CX := (v_dD * v_dE - v_dB * v_dF) / v_dG;
v_CY := (v_dA * v_dF - v_dC * v_dE) / v_dG;
v_Radius := SQRT(POWER(ST_X(p_pt1) - v_CX,2) + POWER(ST_Y(p_pt1) - v_CY,2) );
END IF;
RETURN ST_SetSRID(ST_MakePoint(v_CX, v_CY, v_radius),ST_Srid(p_pt1));
END;
$BODY$
LANGUAGE plpgsql VOLATILE STRICT;

View File

@@ -1,5 +1,7 @@
SET client_min_messages TO WARNING;
\set ECHO none
cdb_spatialinterpolation
t
nn | nni | idw
-----+--------------------------+-----------------
200 | 238.41059602632179224595 | 341.46260750526
(1 row)

View File

@@ -0,0 +1,7 @@
SET client_min_messages TO WARNING;
\set ECHO none
avg_area
----------------------
0.000178661700690617
(1 row)

View File

@@ -1,10 +1,12 @@
SET client_min_messages TO WARNING;
\set ECHO none
\pset format unaligned
WITH a AS (
SELECT
ARRAY[800, 700, 600, 500, 400, 300, 200, 100] AS vals,
ARRAY[ST_GeomFromText('POINT(2.1744 41.403)'),ST_GeomFromText('POINT(2.1228 41.380)'),ST_GeomFromText('POINT(2.1511 41.374)'),ST_GeomFromText('POINT(2.1528 41.413)'),ST_GeomFromText('POINT(2.165 41.391)'),ST_GeomFromText('POINT(2.1498 41.371)'),ST_GeomFromText('POINT(2.1533 41.368)'),ST_GeomFromText('POINT(2.131386 41.41399)')] AS g
)
SELECT (cdb_crankshaft.CDB_SpatialInterpolation(g, vals, ST_GeomFromText('POINT(2.154 41.37)'), 1) - 780.79470198683925288365) / 780.79470198683925288365 < 0.001 As cdb_spatialinterpolation FROM a;
SELECT
cdb_crankshaft.CDB_SpatialInterpolation(g, vals, ST_GeomFromText('POINT(2.154 41.37)'),0) as NN,
cdb_crankshaft.CDB_SpatialInterpolation(g, vals, ST_GeomFromText('POINT(2.154 41.37)'),1) as NNI,
cdb_crankshaft.CDB_SpatialInterpolation(g, vals, ST_GeomFromText('POINT(2.154 41.37)'),2) as IDW
FROM a;

View File

@@ -0,0 +1,15 @@
SET client_min_messages TO WARNING;
\set ECHO none
WITH a AS (
SELECT
ARRAY[ST_GeomFromText('POINT(2.1744 41.403)', 4326),ST_GeomFromText('POINT(2.1228 41.380)', 4326),ST_GeomFromText('POINT(2.1511 41.374)', 4326),ST_GeomFromText('POINT(2.1528 41.413)', 4326),ST_GeomFromText('POINT(2.165 41.391)', 4326),ST_GeomFromText('POINT(2.1498 41.371)', 4326),ST_GeomFromText('POINT(2.1533 41.368)', 4326),ST_GeomFromText('POINT(2.131386 41.41399)', 4326)] AS geomin
),
b as(
SELECT
(st_dump(cdb_crankshaft.CDB_voronoi(geomin, 0.2, 1e-9))).geom as result
FROM a
)
SELECT
avg(st_area(result)) as avg_area
FROM b;