Compare commits
58 Commits
docker
...
similarity
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
97b4949f84 | ||
|
|
df09d03de6 | ||
|
|
b3c55614e3 | ||
|
|
1ddc338f3f | ||
|
|
d7424b02e5 | ||
|
|
45705f3a16 | ||
|
|
1995721921 | ||
|
|
4630d6b549 | ||
|
|
0fca6c3c1a | ||
|
|
fe22464b75 | ||
|
|
cc4a35ebd9 | ||
|
|
633b63bccc | ||
|
|
ea02f36235 | ||
|
|
22b6aed7c1 | ||
|
|
f6e8524669 | ||
|
|
02b74813ac | ||
|
|
4c243bf1d3 | ||
|
|
b0150d4fec | ||
|
|
6bb4f36df5 | ||
|
|
5a46f65e59 | ||
|
|
e56519f599 | ||
|
|
8dd8ab37a5 | ||
|
|
06f5cf9951 | ||
|
|
bc67ae8f69 | ||
|
|
00579cd838 | ||
|
|
3f20275d3d | ||
|
|
eecbe39547 | ||
|
|
1578b17eb8 | ||
|
|
3eda8ecd16 | ||
|
|
0aa4d0a50e | ||
|
|
3b31da783a | ||
|
|
8762f6ca1c | ||
|
|
58c141d217 | ||
|
|
5a7d3178dd | ||
|
|
4903af6cdc | ||
|
|
692014d694 | ||
|
|
47e0253652 | ||
|
|
9f03a9b075 | ||
|
|
b5281d0681 | ||
|
|
689ec8a925 | ||
|
|
a7e42e93cc | ||
|
|
bad09ffd7b | ||
|
|
4706442a1d | ||
|
|
935c7f9963 | ||
|
|
ef3bcaeee8 | ||
|
|
4ffb2c9664 | ||
|
|
dea6e2f1a7 | ||
|
|
d13f167d47 | ||
|
|
a518034e65 | ||
|
|
24e4037995 | ||
|
|
82a738fe40 | ||
|
|
e801c9cb60 | ||
|
|
0206cc6c44 | ||
|
|
b754ffe42a | ||
|
|
0056f411b5 | ||
|
|
1810f02242 | ||
|
|
8e972128eb | ||
|
|
cdd2d9e722 |
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
envs/
|
||||
*.pyc
|
||||
.DS_Store
|
||||
142
CONTRIBUTING.md
142
CONTRIBUTING.md
@@ -1,84 +1,94 @@
|
||||
# Contributing guide
|
||||
# Development process
|
||||
|
||||
## How to add new functions
|
||||
Please read the Working Process/Quickstart Guide in [README.md](https://github.com/CartoDB/crankshaft/blob/master/README.md) first.
|
||||
|
||||
Try to put as little logic in the SQL extension as possible and
|
||||
just use it as a wrapper to the Python module functionality.
|
||||
For any modification of crankshaft, such as adding new features,
|
||||
refactoring or bug-fixing, topic branch must be created out of the `develop`
|
||||
branch and be used for the development process.
|
||||
|
||||
Once a function is defined it should never change its signature in subsequent
|
||||
versions. To change a function's signature a new function with a different
|
||||
name must be created.
|
||||
Modifications are done inside `src/pg/sql` and `src/py/crankshaft`.
|
||||
|
||||
### Version numbers
|
||||
Take into account:
|
||||
|
||||
The version of both the SQL extension and the Python package shall
|
||||
follow the [Semantic Versioning 2.0](http://semver.org/) guidelines:
|
||||
* Tests must be added for any new functionality
|
||||
(inside `src/pg/test`, `src/py/crankshaft/test`) as well as to
|
||||
detect any bugs that are being fixed.
|
||||
* Add or modify the corresponding documentation files in the `doc` folder.
|
||||
Since we expect to have highly technical functions here, an extense
|
||||
background explanation would be of great help to users of this extension.
|
||||
* Convention: snake case(i.e. `snake_case` and not `CamelCase`)
|
||||
shall be used for all function names.
|
||||
Prefix function names intended for public use with `cdb_`
|
||||
and private functions (to be used only internally inside
|
||||
the extension) with `_cdb_`.
|
||||
|
||||
* When backwards incompatibility is introduced the major number is incremented
|
||||
* When functionally is added (in a backwards-compatible manner) the minor number
|
||||
is incremented
|
||||
* When only fixes are introduced (backwards-compatible) the patch number is
|
||||
incremented
|
||||
Once the code is ready to be tested, update the local development installation
|
||||
with `sudo make install`.
|
||||
This will update the 'dev' version of the extension in `src/pg/` and
|
||||
make it available to PostgreSQL.
|
||||
It will also install the python package (crankshaft) in a virtual
|
||||
environment `env/dev`.
|
||||
|
||||
### Python Package
|
||||
The version number of the Python package, defined in
|
||||
`src/pg/crankshaft/setup.py` will be overridden when
|
||||
the package is released and always match the extension version number,
|
||||
but for development it shall be kept as '0.0.0'.
|
||||
|
||||
...
|
||||
Run the tests with `make test`.
|
||||
|
||||
### SQL Extension
|
||||
|
||||
* Generate a **new subfolder version** for `sql` and `test` folders to define
|
||||
the new functions and tests
|
||||
- Use symlinks to avoid file duplication between versions that don't update them
|
||||
- Add new files or modify copies of the old files to add new functions or
|
||||
modify existing functions (remember to rename a function if the signature
|
||||
changes)
|
||||
- Add or modify the corresponding documentation files in the `doc` folder.
|
||||
Since we expect to have highly technical functions here, an extense
|
||||
background explanation would be of great help to users of this extension.
|
||||
- Create tests for the new functions/behaviour
|
||||
|
||||
* Generate the **upgrade and downgrade files** for the extension
|
||||
|
||||
* Update the control file and the Makefile to generate the complete SQL
|
||||
file for the new created version. After running `make` a new
|
||||
file `crankshaft--X.Y.Z.sql` will be created for the current version.
|
||||
Additional files for migrating to/from the previous version A.B.Z should be
|
||||
created:
|
||||
- `crankshaft--X.Y.Z--A.B.C.sql`
|
||||
- `crankshaft--A.B.C--X.Y.Z.sql`
|
||||
All these new files must be added to git and pushed.
|
||||
|
||||
* Update the public docs! ;-)
|
||||
|
||||
## Conventions
|
||||
|
||||
# SQL
|
||||
|
||||
Use snake case (i.e. `snake_case` and not `CamelCase`) for all
|
||||
functions. Prefix functions intended for public use with `cdb_`
|
||||
and private functions (to be used only internally inside
|
||||
the extension) with `_cdb_`.
|
||||
|
||||
# Python
|
||||
|
||||
...
|
||||
|
||||
## Testing
|
||||
|
||||
Running just the Python tests:
|
||||
To use the python extension for custom tests, activate the virtual
|
||||
environment with:
|
||||
|
||||
```
|
||||
(cd python && make test)
|
||||
source envs/dev/bin/activate
|
||||
```
|
||||
|
||||
Installing the Extension and running just the PostgreSQL tests:
|
||||
Update extension in a working database with:
|
||||
|
||||
* `ALTER EXTENSION crankshaft UPDATE TO 'current';`
|
||||
`ALTER EXTENSION crankshaft UPDATE TO 'dev';`
|
||||
|
||||
Note: we keep the current development version install as 'dev' always;
|
||||
we update through the 'current' alias to allow changing the extension
|
||||
contents but not the version identifier. This will fail if the
|
||||
changes involve incompatible function changes such as a different
|
||||
return type; in that case the offending function (or the whole extension)
|
||||
should be dropped manually before the update.
|
||||
|
||||
If the extension has not previously been installed in a database,
|
||||
it can be installed directly with:
|
||||
|
||||
* `CREATE EXTENSION IF NOT EXISTS plpythonu;`
|
||||
`CREATE EXTENSION IF NOT EXISTS postgis;`
|
||||
`CREATE EXTENSION IF NOT EXISTS cartodb;`
|
||||
`CREATE EXTENSION crankshaft WITH VERSION 'dev';`
|
||||
|
||||
Note: the development extension uses the development python virtual
|
||||
environment automatically.
|
||||
|
||||
Before proceeding to the release process peer code reviewing of the code is
|
||||
a must.
|
||||
|
||||
Once the feature or bugfix is completed and all the tests are passing
|
||||
a Pull-Request shall be created on the topic branch, reviewed by a peer
|
||||
and then merged back into the `develop` branch when all CI tests pass.
|
||||
|
||||
When the changes in the `develop` branch are to be released in a new
|
||||
version of the extension, a PR must be created on the `develop` branch.
|
||||
|
||||
The release manage will take hold of the PR at this moment to proceed
|
||||
to the release process for a new revision of the extension.
|
||||
|
||||
## Relevant development tasks available in the Makefile
|
||||
|
||||
```
|
||||
(cd pg && sudo make install && PGUSER=postgres make installcheck)
|
||||
```
|
||||
* `make help` show a short description of the available targets
|
||||
|
||||
Installing and testing everything:
|
||||
* `sudo make install` will generate the extension scripts for the development
|
||||
version ('dev'/'current') and install the python package into the
|
||||
development virtual environment `envs/dev`.
|
||||
Intended for use by developers.
|
||||
|
||||
```
|
||||
sudo make install && PGUSER=postgres make testinstalled
|
||||
* `make test` will run the tests for the installed development extension.
|
||||
Intended for use by developers.
|
||||
```
|
||||
|
||||
43
DEPLOYING.md
43
DEPLOYING.md
@@ -1,43 +0,0 @@
|
||||
# Workflow
|
||||
|
||||
... (branching/merging flow)
|
||||
|
||||
# Deployment
|
||||
|
||||
...
|
||||
|
||||
Deployment to db servers: the next command will install both the Python
|
||||
package and the extension.
|
||||
|
||||
```
|
||||
sudo make install
|
||||
```
|
||||
|
||||
Installing only the Python package:
|
||||
|
||||
```
|
||||
sudo pip install python/crankshaft --upgrade
|
||||
```
|
||||
|
||||
Caveat: note that `pip install ./crankshaft` will install
|
||||
from local files, but `pip install crankshaft` will not.
|
||||
|
||||
CI: Install and run the tests on the installed extension and package:
|
||||
|
||||
```
|
||||
(sudo make install && PGUSER=postgres make testinstalled)
|
||||
```
|
||||
|
||||
Installing the extension in user databases:
|
||||
Once installed in a server, the extension can be added
|
||||
to a database with the next SQL command:
|
||||
|
||||
```
|
||||
CREATE EXTENSION crankshaft;
|
||||
```
|
||||
|
||||
To upgrade the extension to an specific version X.Y.Z:
|
||||
|
||||
```
|
||||
ALTER EXTENSION crankshaft UPGRADE TO 'X.Y.Z';
|
||||
```
|
||||
69
Makefile
69
Makefile
@@ -1,13 +1,70 @@
|
||||
EXT_DIR = pg
|
||||
PYP_DIR = python
|
||||
include ./Makefile.global
|
||||
|
||||
EXT_DIR = src/pg
|
||||
PYP_DIR = src/py
|
||||
|
||||
.PHONY: install
|
||||
.PHONY: run_tests
|
||||
.PHONY: release
|
||||
.PHONY: deploy
|
||||
|
||||
install:
|
||||
# Generate and install developmet versions of the extension
|
||||
# and python package.
|
||||
# The extension is named 'dev' with a 'current' alias for easily upgrading.
|
||||
# The Python package is installed in a virtual environment envs/dev/
|
||||
# Requires sudo.
|
||||
install: ## Generate and install development version of the extension; requires sudo.
|
||||
$(MAKE) -C $(PYP_DIR) install
|
||||
$(MAKE) -C $(EXT_DIR) install
|
||||
|
||||
testinstalled:
|
||||
$(MAKE) -C $(PYP_DIR) testinstalled
|
||||
$(MAKE) -C $(EXT_DIR) installcheck
|
||||
# Run the tests for the installed development extension and
|
||||
# python package
|
||||
test: ## Run the tests for the development version of the extension
|
||||
$(MAKE) -C $(PYP_DIR) test
|
||||
$(MAKE) -C $(EXT_DIR) test
|
||||
|
||||
# Generate a new release into release
|
||||
release: ## Generate a new release of the extension. Only for telease manager
|
||||
$(MAKE) -C $(EXT_DIR) release
|
||||
$(MAKE) -C $(PYP_DIR) release
|
||||
|
||||
# Install the current release.
|
||||
# The Python package is installed in a virtual environment envs/X.Y.Z/
|
||||
# Requires sudo.
|
||||
# Use the RELEASE_VERSION environment variable to deploy a specific version:
|
||||
# sudo make deploy RELEASE_VERSION=1.0.0
|
||||
deploy: ## Deploy a released extension. Only for release manager. Requires sudo.
|
||||
$(MAKE) -C $(EXT_DIR) deploy
|
||||
$(MAKE) -C $(PYP_DIR) deploy
|
||||
|
||||
# Cleanup development extension script files
|
||||
clean-dev: ## clean up development extension script files
|
||||
rm -f src/pg/$(EXTENSION)--*.sql
|
||||
|
||||
# Cleanup all releases
|
||||
clean-releases: ## clean up all releases
|
||||
rm -rf release/python/*
|
||||
rm -f release/$(EXTENSION)--*.sql
|
||||
rm -f release/$(EXTENSION).control
|
||||
|
||||
# Cleanup current/specific version
|
||||
clean-release: ## clean up current release
|
||||
rm -rf release/python/$(RELEASE_VERSION)
|
||||
rm -f release/$(RELEASE_VERSION)--*.sql
|
||||
|
||||
# Cleanup all virtual environments
|
||||
clean-environments: ## clean up all virtual environments
|
||||
rm -rf envs/*
|
||||
|
||||
clean-all: clean-dev clean-release clean-environments
|
||||
|
||||
help:
|
||||
@IFS=$$'\n' ; \
|
||||
help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \
|
||||
for help_line in $${help_lines[@]}; do \
|
||||
IFS=$$'#' ; \
|
||||
help_split=($$help_line) ; \
|
||||
help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
|
||||
help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
|
||||
printf "%-30s %s\n" $$help_command $$help_info ; \
|
||||
done
|
||||
|
||||
6
Makefile.global
Normal file
6
Makefile.global
Normal file
@@ -0,0 +1,6 @@
|
||||
SELF_DIR := $(dir $(lastword $(MAKEFILE_LIST)))
|
||||
EXTENSION = crankshaft
|
||||
PACKAGE = crankshaft
|
||||
EXTVERSION = $(shell grep default_version $(SELF_DIR)/src/pg/$(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
|
||||
RELEASE_VERSION ?= $(EXTVERSION)
|
||||
SED = sed
|
||||
7
NEWS.md
Normal file
7
NEWS.md
Normal file
@@ -0,0 +1,7 @@
|
||||
0.0.2 (2016-03-16)
|
||||
------------------
|
||||
* New versioning approach using per-version Python virtual environments
|
||||
|
||||
0.0.1 (2016-02-22)
|
||||
------------------
|
||||
* Preliminar release
|
||||
65
README.md
65
README.md
@@ -4,9 +4,68 @@ CartoDB Spatial Analysis extension for PostgreSQL.
|
||||
|
||||
## Code organization
|
||||
|
||||
* *pg* contains the PostgreSQL extension source code
|
||||
* *python* Python module
|
||||
* *doc* documentation
|
||||
* *src* source code
|
||||
* - *src/pg* contains the PostgreSQL extension source code
|
||||
* - *src/py* Python module source code
|
||||
* *release* reseleased versions
|
||||
* *env* base directory for Python virtual environments
|
||||
|
||||
## Requirements
|
||||
|
||||
* pip
|
||||
* pip, virtualenv, PostgreSQL
|
||||
* python-scipy system package (see [src/py/README.md](https://github.com/CartoDB/crankshaft/blob/master/src/py/README.md))
|
||||
|
||||
# Working Process -- Quickstart Guide
|
||||
|
||||
We distinguish two roles regarding the development cycle of crankshaft:
|
||||
|
||||
* *developers* will implement new functionality and bugfixes into
|
||||
the codebase and will request for new releases of the extension.
|
||||
* A *release manager* will attend these requests and will handle
|
||||
the release process. The release process is sequential:
|
||||
no concurrent releases will ever be in the works.
|
||||
|
||||
We use the default `develop` branch as the basis for development.
|
||||
The `master` branch is used to merge and tag releases to be
|
||||
deployed in production.
|
||||
|
||||
Developers shall create a new topic branch from `develop` for any new feature
|
||||
or bugfix and commit their changes to it and eventually merge back into
|
||||
the `develop` branch. When a new release is required a Pull Request
|
||||
will be open against the `develop` branch.
|
||||
|
||||
The `develop` pull requests will be handled by the release manage,
|
||||
who will merge into master where new releases are prepared and tagged.
|
||||
The `master` branch is the sole responsibility of the release masters
|
||||
and developers must not commit or merge into it.
|
||||
|
||||
## Development Guidelines
|
||||
|
||||
For a detailed description of the development process please see
|
||||
the [CONTRIBUTING.md](https://github.com/CartoDB/crankshaft/blob/master/CONTRIBUTING.md) guide.
|
||||
|
||||
Any modification to the source code (`src/pg/sql` for the SQL extension,
|
||||
`src/py/crankshaft` for the Python package) shall always be done
|
||||
in a topic branch created from the `develop` branch.
|
||||
|
||||
Tests, documentation and peer code reviewing are required for all
|
||||
modifications.
|
||||
|
||||
The tests (both for SQL and Python) are executed by running,
|
||||
from the top directory:
|
||||
|
||||
```
|
||||
sudo make install
|
||||
make test
|
||||
```
|
||||
|
||||
To request a new release, which will be handled by them
|
||||
release manager, a Pull Request must be created in the `develop`
|
||||
branch.
|
||||
|
||||
## Release
|
||||
|
||||
The release and deployment process is described in the
|
||||
[RELEASE.md](https://github.com/CartoDB/crankshaft/blob/master/RELEASE.md) guide and it is the responsibility of the designated
|
||||
release manager.
|
||||
|
||||
93
RELEASE.md
Normal file
93
RELEASE.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Release & Deployment Process
|
||||
|
||||
Please read the Working Process/Quickstart Guide in README.md
|
||||
and the Development guidelines in CONTRIBUTING.md.
|
||||
|
||||
The release process of a new version of the extension
|
||||
shall be performed by the designated *Release Manager*.
|
||||
|
||||
Note that we expect to gradually automate more of this process.
|
||||
|
||||
Having checked PR to be released it shall be
|
||||
merged back into the `master` branch to prepare the new release.
|
||||
|
||||
The version number in `pg/cranckshaft.control` must first be updated.
|
||||
To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
|
||||
|
||||
Thew `NEWS.md` will be updated.
|
||||
|
||||
We now will explain the process for the case of backwards-compatible
|
||||
releases (updating the minor or patch version numbers).
|
||||
|
||||
TODO: document the complex case of major releases.
|
||||
|
||||
The next command must be executed to produce the main installation
|
||||
script for the new release, `release/cranckshaft--X.Y.Z.sql` and
|
||||
also to copy the python package to `release/python/X.Y.Z/crankshaft`.
|
||||
|
||||
```
|
||||
make release
|
||||
```
|
||||
|
||||
Then, the release manager shall produce upgrade and downgrade scripts
|
||||
to migrate to/from the previous release. In the case of minor/patch
|
||||
releases this simply consist in extracting the functions that have changed
|
||||
and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
|
||||
file.
|
||||
|
||||
The new release can be deployed for staging/smoke tests with this command:
|
||||
|
||||
```
|
||||
sudo make deploy
|
||||
```
|
||||
|
||||
This will copy the current 'X.Y.Z' released version of the extension to
|
||||
PostgreSQL. The corresponding Python extension will be installed in a
|
||||
virtual environment in `envs/X.Y.Z`.
|
||||
|
||||
It can be activated with:
|
||||
|
||||
```
|
||||
source envs/X.Y.Z/bin/activate
|
||||
```
|
||||
|
||||
But note that this is needed only for using the package directly;
|
||||
the 'X.Y.Z' version of the extension will automatically use the
|
||||
python package from this virtual environment.
|
||||
|
||||
The `sudo make deploy` operation can be also used for installing
|
||||
the new version after it has been released.
|
||||
|
||||
To install a specific version 'X.Y.Z' different from the current one
|
||||
(which must be present in `releases/`) you can:
|
||||
|
||||
```
|
||||
sudo make deploy RELEASE_VERSION=X.Y.Z
|
||||
```
|
||||
|
||||
TODO: testing procedure for the new release.
|
||||
|
||||
TODO: procedure for staging deployment.
|
||||
|
||||
TODO: procedure for merging to master, tagging and deploying
|
||||
in production.
|
||||
|
||||
## Relevant release & deployment tasks available in the Makefile
|
||||
|
||||
```
|
||||
* `make help` show a short description of the available targets
|
||||
|
||||
* `make release` will generate a new release (version number defined in
|
||||
`src/pg/crankshaft.control`) into `release/`.
|
||||
Intended for use by the release manager.
|
||||
|
||||
* `sudo make deploy` will install the current release X.Y.Z from the
|
||||
`release/` files into PostgreSQL and a Python virtual environment
|
||||
`envs/X.Y.Z`.
|
||||
Intended for use by the release manager and deployment jobs.
|
||||
|
||||
* `sudo make deploy RELEASE_VERSION=X.Y.Z` will install specified version
|
||||
previously generated in `release/`
|
||||
into PostgreSQL and a Python virtual environment `envs/X.Y.Z`.
|
||||
Intended for use by the release manager and deployment jobs.
|
||||
```
|
||||
9
TODO.md
9
TODO.md
@@ -1,9 +0,0 @@
|
||||
* [x] Support versioning
|
||||
* [x] Test use of `plpy` from python Package
|
||||
* [x] Add `pysal` etc. dependencies
|
||||
* [x] Define documentation practices (general, per extension/package?)
|
||||
* [x] Add initial function set (WIP)
|
||||
* Unify style of function comments
|
||||
* [x] Add integration tests
|
||||
* Make target to open a new version development (create symlinks, etc.)
|
||||
* [x] Should add cartodb ext. as a dependency?
|
||||
169
doc/02_moran.md
Normal file
169
doc/02_moran.md
Normal file
@@ -0,0 +1,169 @@
|
||||
## Name
|
||||
|
||||
CDB_AreasOfInterest -- returns a table with a cluster/outlier classification, the significance of a classification, an autocorrelation statistic (Local Moran's I), and the geometry id for each geometry in the original dataset.
|
||||
|
||||
## Synopsis
|
||||
|
||||
```sql
|
||||
table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name)
|
||||
|
||||
table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name, int permutations, text geom_column, text id_column, text weight_type, int num_ngbrs)
|
||||
```
|
||||
|
||||
## Description
|
||||
|
||||
CDB_AreasOfInterest is a table-returning function that classifies the geometries in a table by an attribute and gives a significance for that classification. This information can be used to find "Areas of Interest" by using the correlation of a geometry's attribute with that of its neighbors. Areas can be clusters, outliers, or neither (depending on which significance value is used).
|
||||
|
||||
Inputs:
|
||||
|
||||
* `query` (required): an arbitrary query against tables you have access to (e.g., in your account, shared in your organization, or through the Data Observatory). This string must contain the following columns: an id `INT` (e.g., `cartodb_id`), geometry (e.g., `the_geom`), and the numeric attribute which is specified in `column_name`
|
||||
* `column_name` (required): column to perform the area of interest analysis tool on. The data must be numeric (e.g., `float`, `int`, etc.)
|
||||
* `permutations` (optional): used to calculate the significance of a classification. Defaults to 99, which is sufficient in most situations.
|
||||
* `geom_column` (optional): the name of the geometry column. Data must be of type `geometry`.
|
||||
* `id_column` (optional): the name of the id column (e.g., `cartodb_id`). Data must be of type `int` or `bigint` and have a unique condition on the data.
|
||||
* `weight_type` (optional): the type of weight used for determining what defines a neighborhood. Options are `knn` or `queen`.
|
||||
* `num_ngbrs` (optional): the number of neighbors in a neighborhood around a geometry. Only used if `knn` is chosen above.
|
||||
|
||||
Outputs:
|
||||
|
||||
* `moran_val`: underlying correlation statistic used in analysis
|
||||
* `quadrant`: human-readable interpretation of classification
|
||||
* `significance`: significance of classification (closer to 0 is more significant)
|
||||
* `ids`: id of original geometry (used for joining against original table if desired -- see examples)
|
||||
* `column_values`: original column values from `column_name`
|
||||
|
||||
Availability: crankshaft v0.0.1 and above
|
||||
|
||||
## Examples
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
t.the_geom_webmercator,
|
||||
t.cartodb_id,
|
||||
aoi.significance,
|
||||
aoi.quadrant As aoi_quadrant
|
||||
FROM
|
||||
observatory.acs2013 As t
|
||||
JOIN
|
||||
crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013',
|
||||
'gini_index')
|
||||
```
|
||||
|
||||
## API Usage
|
||||
|
||||
Example
|
||||
|
||||
```text
|
||||
http://eschbacher.cartodb.com/api/v2/sql?q=SELECT * FROM crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013','gini_index')
|
||||
```
|
||||
|
||||
Result
|
||||
```json
|
||||
{
|
||||
time: 0.120,
|
||||
total_rows: 100,
|
||||
rows: [{
|
||||
moran_vals: 0.7213,
|
||||
quadrant: 'High area',
|
||||
significance: 0.03,
|
||||
ids: 1,
|
||||
column_value: 0.22
|
||||
},
|
||||
{
|
||||
moran_vals: -0.7213,
|
||||
quadrant: 'Low outlier',
|
||||
significance: 0.13,
|
||||
ids: 2,
|
||||
column_value: 0.03
|
||||
},
|
||||
...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
crankshaft's areas of interest functions:
|
||||
|
||||
* [CDB_AreasOfInterest_Global]()
|
||||
* [CDB_AreasOfInterest_Rate_Local]()
|
||||
* [CDB_AreasOfInterest_Rate_Global]()
|
||||
|
||||
|
||||
PostGIS clustering functions:
|
||||
|
||||
* [ST_ClusterIntersecting](http://postgis.net/docs/manual-2.2/ST_ClusterIntersecting.html)
|
||||
* [ST_ClusterWithin](http://postgis.net/docs/manual-2.2/ST_ClusterWithin.html)
|
||||
|
||||
|
||||
-- removing below, working into above
|
||||
|
||||
#### What is Moran's I and why is it significant for CartoDB?
|
||||
|
||||
Moran's I is a geostatistical calculation which gives a measure of the global
|
||||
clustering and presence of outliers within the geographies in a map. Here global
|
||||
means over all of the geographies in a dataset. Imagine mapping the incidence
|
||||
rates of cancer in neighborhoods of a city. If there were areas covering several
|
||||
neighborhoods with abnormally low rates of cancer, those areas are positively
|
||||
spatially correlated with one another and would be considered a cluster. If
|
||||
there was a single neighborhood with a high rate but with all neighbors on
|
||||
average having a low rate, it would be considered a spatial outlier.
|
||||
|
||||
While Moran's I gives a global snapshot, there are local indicators for
|
||||
clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
|
||||
process related to autocorrelation -- i.e., a process that compares a
|
||||
geography's attribute to the attribute in neighbor geographies.
|
||||
|
||||
For the example of cancer rates in neighborhoods, since these neighborhoods have
|
||||
a high value for rate of cancer, and all of their neighbors do as well, they are
|
||||
designated as "High High" or simply **HH**. For areas with multiple neighborhoods
|
||||
with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
|
||||
naturally fit into the concept of clustering and are in the correlated
|
||||
variables.
|
||||
|
||||
"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
|
||||
where a geog has a high value and it's neighbors, on average, have a low value
|
||||
(or vice versa). An example of this is a "gated community" or placement of a
|
||||
city housing project in a rich region. These deliberate developments have
|
||||
opposite median income as compared to the neighbors around them. They have a
|
||||
high (or low) value while their neighbors have a low (or high) value. They exist
|
||||
typically as islands, and in rare circumstances can extend as chains dividing
|
||||
**LL** or **HH**.
|
||||
|
||||
Strong policies such as rent stabilization (probably) tend to prevent the
|
||||
clustering of high rent areas as they integrate middle class incomes. Luxury
|
||||
apartment buildings, which are a kind of gated community, probably tend to skew
|
||||
an area's median income upwards while housing projects have the opposite effect.
|
||||
What are the nuggets in the analysis?
|
||||
|
||||
Two functions are available to compute Moran I statistics:
|
||||
|
||||
* `cdb_moran_local` computes Moran I measures, quad classification and
|
||||
significance values from numerial values associated to geometry entities
|
||||
in an input table. The geometries should be contiguous polygons When
|
||||
then `queen` `w_type` is used.
|
||||
* `cdb_moran_local_rate` computes the same statistics using a ratio between
|
||||
numerator and denominator columns of a table.
|
||||
|
||||
The parameters for `cdb_moran_local` are:
|
||||
|
||||
* `table` name of the table that contains the data values
|
||||
* `attr` name of the column
|
||||
* `signficance` significance threshold for the quads values
|
||||
* `num_ngbrs` number of neighbors to consider (default: 5)
|
||||
* `permutations` number of random permutations for calculation of
|
||||
pseudo-p values (default: 99)
|
||||
* `geom_column` number of the geometry column (default: "the_geom")
|
||||
* `id_col` PK column of the table (default: "cartodb_id")
|
||||
* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
|
||||
or "queen" for contiguity based weights.
|
||||
|
||||
The function returns a table with the following columns:
|
||||
|
||||
* `moran` Moran's value
|
||||
* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
|
||||
* `significance` significance value
|
||||
* `ids` id of the corresponding record in the input table
|
||||
|
||||
Function `cdb_moran_local_rate` only differs in that the `attr` input
|
||||
parameter is substituted by `numerator` and `denominator`.
|
||||
24
doc/docs_template.md
Normal file
24
doc/docs_template.md
Normal file
@@ -0,0 +1,24 @@
|
||||
|
||||
## Name
|
||||
|
||||
## Synopsis
|
||||
|
||||
## Description
|
||||
|
||||
Availability: v...
|
||||
|
||||
## Examples
|
||||
|
||||
```SQL
|
||||
-- example of the function in use
|
||||
SELECT cdb_awesome_function(the_geom, 'total_pop')
|
||||
FROM table_name
|
||||
```
|
||||
|
||||
## API Usage
|
||||
|
||||
_asdf_
|
||||
|
||||
## See Also
|
||||
|
||||
_Other function pages_
|
||||
3
pg/.gitignore
vendored
3
pg/.gitignore
vendored
@@ -1,3 +0,0 @@
|
||||
regression.diffs
|
||||
regression.out
|
||||
results/
|
||||
33
pg/Makefile
33
pg/Makefile
@@ -1,33 +0,0 @@
|
||||
# Makefile to generate the extension out of separate sql source files.
|
||||
# Once a version is released, it is not meant to be changed. E.g: once version 0.0.1 is out, it SHALL NOT be changed.
|
||||
|
||||
EXTENSION = crankshaft
|
||||
EXTVERSION = $(shell grep default_version $(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
|
||||
|
||||
# The new version to be generated from templates
|
||||
NEW_EXTENSION_ARTIFACT = $(EXTENSION)--$(EXTVERSION).sql
|
||||
|
||||
# DATA is a special variable used by postgres build infrastructure
|
||||
# These are the files to be installed in the server shared dir,
|
||||
# for installation from scratch, upgrades and downgrades.
|
||||
# @see http://www.postgresql.org/docs/current/static/extend-pgxs.html
|
||||
DATA = $(NEW_EXTENSION_ARTIFACT)
|
||||
|
||||
SOURCES_DATA_DIR = sql/$(EXTVERSION)
|
||||
SOURCES_DATA = $(wildcard sql/$(EXTVERSION)/*.sql)
|
||||
|
||||
# The extension installation artifacts are stored in the base subdirectory
|
||||
$(NEW_EXTENSION_ARTIFACT): $(SOURCES_DATA)
|
||||
rm -f $@
|
||||
cat $(SOURCES_DATA_DIR)/*.sql >> $@
|
||||
|
||||
REGRESS = $(notdir $(basename $(wildcard test/$(EXTVERSION)/sql/*test.sql)))
|
||||
TEST_DIR = test/$(EXTVERSION)
|
||||
REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
|
||||
|
||||
PG_CONFIG = pg_config
|
||||
PGXS := $(shell $(PG_CONFIG) --pgxs)
|
||||
include $(PGXS)
|
||||
|
||||
# This seems to be needed at least for PG 9.3.11
|
||||
all: $(DATA)
|
||||
@@ -1,7 +0,0 @@
|
||||
|
||||
# Running the tests:
|
||||
|
||||
```
|
||||
sudo make install
|
||||
PGUSER=postgres make installcheck
|
||||
```
|
||||
@@ -1,71 +0,0 @@
|
||||
### Moran's I
|
||||
|
||||
#### What is Moran's I and why is it significant for CartoDB?
|
||||
|
||||
Moran's I is a geostatistical calculation which gives a measure of the global
|
||||
clustering and presence of outliers within the geographies in a map. Here global
|
||||
means over all of the geographies in a dataset. Imagine mapping the incidence
|
||||
rates of cancer in neighborhoods of a city. If there were areas covering several
|
||||
neighborhoods with abnormally low rates of cancer, those areas are positively
|
||||
spatially correlated with one another and would be considered a cluster. If
|
||||
there was a single neighborhood with a high rate but with all neighbors on
|
||||
average having a low rate, it would be considered a spatial outlier.
|
||||
|
||||
While Moran's I gives a global snapshot, there are local indicators for
|
||||
clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
|
||||
process related to autocorrelation -- i.e., a process that compares a
|
||||
geography's attribute to the attribute in neighbor geographies.
|
||||
|
||||
For the example of cancer rates in neighborhoods, since these neighborhoods have
|
||||
a high value for rate of cancer, and all of their neighbors do as well, they are
|
||||
designated as "High High" or simply **HH**. For areas with multiple neighborhoods
|
||||
with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
|
||||
naturally fit into the concept of clustering and are in the correlated
|
||||
variables.
|
||||
|
||||
"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
|
||||
where a geog has a high value and it's neighbors, on average, have a low value
|
||||
(or vice versa). An example of this is a "gated community" or placement of a
|
||||
city housing project in a rich region. These deliberate developments have
|
||||
opposite median income as compared to the neighbors around them. They have a
|
||||
high (or low) value while their neighbors have a low (or high) value. They exist
|
||||
typically as islands, and in rare circumstances can extend as chains dividing
|
||||
**LL** or **HH**.
|
||||
|
||||
Strong policies such as rent stabilization (probably) tend to prevent the
|
||||
clustering of high rent areas as they integrate middle class incomes. Luxury
|
||||
apartment buildings, which are a kind of gated community, probably tend to skew
|
||||
an area's median income upwards while housing projects have the opposite effect.
|
||||
What are the nuggets in the analysis?
|
||||
|
||||
Two functions are available to compute Moran I statistics:
|
||||
|
||||
* `cdb_moran_local` computes Moran I measures, quad classification and
|
||||
significance values from numerial values associated to geometry entities
|
||||
in an input table. The geometries should be contiguous polygons When
|
||||
then `queen` `w_type` is used.
|
||||
* `cdb_moran_local_rate` computes the same statistics using a ratio between
|
||||
numerator and denominator columns of a table.
|
||||
|
||||
The parameters for `cdb_moran_local` are:
|
||||
|
||||
* `table` name of the table that contains the data values
|
||||
* `attr` name of the column
|
||||
* `signficance` significance threshold for the quads values
|
||||
* `num_ngbrs` number of neighbors to consider (default: 5)
|
||||
* `permutations` number of random permutations for calculation of
|
||||
pseudo-p values (default: 99)
|
||||
* `geom_column` number of the geometry column (default: "the_geom")
|
||||
* `id_col` PK column of the table (default: "cartodb_id")
|
||||
* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
|
||||
or "queen" for contiguity based weights.
|
||||
|
||||
The function returns a table with the following columns:
|
||||
|
||||
* `moran` Moran's value
|
||||
* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
|
||||
* `significance` significance value
|
||||
* `ids` id of the corresponding record in the input table
|
||||
|
||||
Function `cdb_moran_local_rate` only differs in that the `attr` input
|
||||
parameter is substituted by `numerator` and `denominator`.
|
||||
@@ -1,6 +0,0 @@
|
||||
-- Install dependencies
|
||||
CREATE EXTENSION plpythonu;
|
||||
CREATE EXTENSION postgis;
|
||||
CREATE EXTENSION cartodb;
|
||||
-- Install the extension
|
||||
CREATE EXTENSION crankshaft;
|
||||
1
python/.gitignore
vendored
1
python/.gitignore
vendored
@@ -1 +0,0 @@
|
||||
*.pyc
|
||||
@@ -1,11 +0,0 @@
|
||||
# Install the package (needs root privileges)
|
||||
install:
|
||||
pip install ./crankshaft --upgrade
|
||||
|
||||
# Test from source code
|
||||
test:
|
||||
(cd crankshaft && nosetests test/)
|
||||
|
||||
# Test currently installed package
|
||||
testinstalled:
|
||||
nosetests crankshaft/test/
|
||||
@@ -1,9 +0,0 @@
|
||||
# Crankshaft Python Package
|
||||
|
||||
...
|
||||
### Run the tests
|
||||
|
||||
```bash
|
||||
cd crankshaft
|
||||
nosetests test/
|
||||
```
|
||||
0
release/.gitignore
vendored
Normal file
0
release/.gitignore
vendored
Normal file
74
release/crankshaft--0.0.1--0.0.2.sql
Normal file
74
release/crankshaft--0.0.1--0.0.2.sql
Normal file
@@ -0,0 +1,74 @@
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft.cdb_crankshaft_version()
|
||||
RETURNS text AS $$
|
||||
SELECT '0.0.2'::text;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_internal_version()
|
||||
RETURNS text AS $$
|
||||
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_virtualenvs_path()
|
||||
RETURNS text
|
||||
AS $$
|
||||
BEGIN
|
||||
RETURN '/home/ubuntu/crankshaft/envs';
|
||||
END;
|
||||
$$ language plpgsql IMMUTABLE STRICT;
|
||||
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_activate_py()
|
||||
RETURNS VOID
|
||||
AS $$
|
||||
import os
|
||||
# plpy.notice('%',str(os.environ))
|
||||
# activate virtualenv
|
||||
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
|
||||
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
|
||||
default_venv_path = os.path.join(base_path, crankshaft_version)
|
||||
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
|
||||
activate_path = venv_path + '/bin/activate_this.py'
|
||||
exec(open(activate_path).read(), dict(__file__=activate_path))
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft import random_seeds
|
||||
random_seeds.set_random_seeds(seed_value)
|
||||
$$ LANGUAGE plpythonu;
|
||||
-- Moran's I
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_crankshaft.cdb_moran_local (
|
||||
t TEXT,
|
||||
attr TEXT,
|
||||
significance float DEFAULT 0.05,
|
||||
num_ngbrs INT DEFAULT 5,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_column TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn')
|
||||
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.clustering import moran_local
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
|
||||
numerator TEXT,
|
||||
denominator TEXT,
|
||||
significance FLOAT DEFAULT 0.05,
|
||||
num_ngbrs INT DEFAULT 5,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_column TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn')
|
||||
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.clustering import moran_local_rate
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
@@ -1,6 +1,12 @@
|
||||
-- Moran's I
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_moran_local (
|
||||
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
|
||||
AS $$
|
||||
from crankshaft import random_seeds
|
||||
random_seeds.set_random_seeds(seed_value)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_crankshaft.cdb_moran_local (
|
||||
t TEXT,
|
||||
attr TEXT,
|
||||
significance float DEFAULT 0.05,
|
||||
@@ -16,9 +22,8 @@ AS $$
|
||||
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
-- Moran's I Local Rate
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_moran_local_rate(t TEXT,
|
||||
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
|
||||
numerator TEXT,
|
||||
denominator TEXT,
|
||||
significance FLOAT DEFAULT 0.05,
|
||||
@@ -33,3 +38,7 @@ AS $$
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
DROP FUNCTION IF EXISTS cdb_crankshaft.cdb_crankshaft_version();
|
||||
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_internal_version();
|
||||
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_activate_py();
|
||||
186
release/crankshaft--0.0.2.sql
Normal file
186
release/crankshaft--0.0.2.sql
Normal file
@@ -0,0 +1,186 @@
|
||||
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
|
||||
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
|
||||
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
|
||||
-- Version number of the extension release
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
|
||||
RETURNS text AS $$
|
||||
SELECT '0.0.2'::text;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
|
||||
-- Internal identifier of the installed extension instence
|
||||
-- e.g. 'dev' for current development version
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
|
||||
RETURNS text AS $$
|
||||
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
|
||||
RETURNS text
|
||||
AS $$
|
||||
BEGIN
|
||||
-- RETURN '/opt/virtualenvs/crankshaft';
|
||||
RETURN '/home/ubuntu/crankshaft/envs';
|
||||
END;
|
||||
$$ language plpgsql IMMUTABLE STRICT;
|
||||
|
||||
-- Use the crankshaft python module
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
|
||||
RETURNS VOID
|
||||
AS $$
|
||||
import os
|
||||
# plpy.notice('%',str(os.environ))
|
||||
# activate virtualenv
|
||||
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
|
||||
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
|
||||
default_venv_path = os.path.join(base_path, crankshaft_version)
|
||||
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
|
||||
activate_path = venv_path + '/bin/activate_this.py'
|
||||
exec(open(activate_path).read(), dict(__file__=activate_path))
|
||||
$$ LANGUAGE plpythonu;
|
||||
-- Internal function.
|
||||
-- Set the seeds of the RNGs (Random Number Generators)
|
||||
-- used internally.
|
||||
CREATE OR REPLACE FUNCTION
|
||||
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft import random_seeds
|
||||
random_seeds.set_random_seeds(seed_value)
|
||||
$$ LANGUAGE plpythonu;
|
||||
-- Moran's I
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_moran_local (
|
||||
t TEXT,
|
||||
attr TEXT,
|
||||
significance float DEFAULT 0.05,
|
||||
num_ngbrs INT DEFAULT 5,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_column TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn')
|
||||
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.clustering import moran_local
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
-- Moran's I Local Rate
|
||||
CREATE OR REPLACE FUNCTION
|
||||
cdb_moran_local_rate(t TEXT,
|
||||
numerator TEXT,
|
||||
denominator TEXT,
|
||||
significance FLOAT DEFAULT 0.05,
|
||||
num_ngbrs INT DEFAULT 5,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_column TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn')
|
||||
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.clustering import moran_local_rate
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
-- Function by Stuart Lynn for a simple interpolation of a value
|
||||
-- from a polygon table over an arbitrary polygon
|
||||
-- (weighted by the area proportion overlapped)
|
||||
-- Aereal weighting is a very simple form of aereal interpolation.
|
||||
--
|
||||
-- Parameters:
|
||||
-- * geom a Polygon geometry which defines the area where a value will be
|
||||
-- estimated as the area-weighted sum of a given table/column
|
||||
-- * target_table_name table name of the table that provides the values
|
||||
-- * target_column column name of the column that provides the values
|
||||
-- * schema_name optional parameter to defina the schema the target table
|
||||
-- belongs to, which is necessary if its not in the search_path.
|
||||
-- Note that target_table_name should never include the schema in it.
|
||||
-- Return value:
|
||||
-- Aereal-weighted interpolation of the column values over the geometry
|
||||
CREATE OR REPLACE
|
||||
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
|
||||
RETURNS numeric AS
|
||||
$$
|
||||
DECLARE
|
||||
result numeric;
|
||||
qualified_name text;
|
||||
BEGIN
|
||||
IF schema_name IS NULL THEN
|
||||
qualified_name := Format('%I', target_table_name);
|
||||
ELSE
|
||||
qualified_name := Format('%I.%s', schema_name, target_table_name);
|
||||
END IF;
|
||||
EXECUTE Format('
|
||||
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
|
||||
FROM %s AS a
|
||||
WHERE $1 && a.the_geom
|
||||
', target_column, qualified_name)
|
||||
USING geom
|
||||
INTO result;
|
||||
RETURN result;
|
||||
END;
|
||||
$$ LANGUAGE plpgsql;
|
||||
--
|
||||
-- Creates N points randomly distributed arround the polygon
|
||||
--
|
||||
-- @param g - the geometry to be turned in to points
|
||||
--
|
||||
-- @param no_points - the number of points to generate
|
||||
--
|
||||
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
|
||||
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
|
||||
-- misses per point the funciton accepts before giving up.
|
||||
--
|
||||
-- Returns: Multipoint with the requested points
|
||||
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
|
||||
RETURNS GEOMETRY AS $$
|
||||
DECLARE
|
||||
extent GEOMETRY;
|
||||
test_point Geometry;
|
||||
width NUMERIC;
|
||||
height NUMERIC;
|
||||
x0 NUMERIC;
|
||||
y0 NUMERIC;
|
||||
xp NUMERIC;
|
||||
yp NUMERIC;
|
||||
no_left INTEGER;
|
||||
remaining_iterations INTEGER;
|
||||
points GEOMETRY[];
|
||||
bbox_line GEOMETRY;
|
||||
intersection_line GEOMETRY;
|
||||
BEGIN
|
||||
extent := ST_Envelope(geom);
|
||||
width := ST_XMax(extent) - ST_XMIN(extent);
|
||||
height := ST_YMax(extent) - ST_YMIN(extent);
|
||||
x0 := ST_XMin(extent);
|
||||
y0 := ST_YMin(extent);
|
||||
no_left := no_points;
|
||||
|
||||
LOOP
|
||||
if(no_left=0) THEN
|
||||
EXIT;
|
||||
END IF;
|
||||
yp = y0 + height*random();
|
||||
bbox_line = ST_MakeLine(
|
||||
ST_SetSRID(ST_MakePoint(yp, x0),4326),
|
||||
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
|
||||
);
|
||||
intersection_line = ST_Intersection(bbox_line,geom);
|
||||
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
|
||||
points := points || test_point;
|
||||
no_left = no_left - 1 ;
|
||||
END LOOP;
|
||||
RETURN ST_Collect(points);
|
||||
END;
|
||||
$$
|
||||
LANGUAGE plpgsql VOLATILE;
|
||||
-- Make sure by default there are no permissions for publicuser
|
||||
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
|
||||
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
|
||||
|
||||
-- Grant permissions on the schema to publicuser (but just the schema)
|
||||
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
|
||||
|
||||
-- Revoke execute permissions on all functions in the schema by default
|
||||
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;
|
||||
@@ -1,5 +1,5 @@
|
||||
comment = 'CartoDB Spatial Analysis extension'
|
||||
default_version = '0.0.1'
|
||||
default_version = '0.0.2'
|
||||
requires = 'plpythonu, postgis, cartodb'
|
||||
superuser = true
|
||||
schema = cdb_crankshaft
|
||||
0
release/python/.gitignore
vendored
Normal file
0
release/python/.gitignore
vendored
Normal file
@@ -10,7 +10,7 @@ from setuptools import setup, find_packages
|
||||
setup(
|
||||
name='crankshaft',
|
||||
|
||||
version='0.0.1',
|
||||
version='0.0.01',
|
||||
|
||||
description='CartoDB Spatial Analysis Python Library',
|
||||
|
||||
2
release/python/0.0.2/crankshaft/crankshaft/__init__.py
Normal file
2
release/python/0.0.2/crankshaft/crankshaft/__init__.py
Normal file
@@ -0,0 +1,2 @@
|
||||
import random_seeds
|
||||
import clustering
|
||||
@@ -0,0 +1 @@
|
||||
from moran import *
|
||||
321
release/python/0.0.2/crankshaft/crankshaft/clustering/moran.py
Normal file
321
release/python/0.0.2/crankshaft/crankshaft/clustering/moran.py
Normal file
@@ -0,0 +1,321 @@
|
||||
"""
|
||||
Moran's I geostatistics (global clustering & outliers presence)
|
||||
"""
|
||||
|
||||
# TODO: Fill in local neighbors which have null/NoneType values with the
|
||||
# average of the their neighborhood
|
||||
|
||||
import numpy as np
|
||||
import pysal as ps
|
||||
import plpy
|
||||
|
||||
# High level interface ---------------------------------------
|
||||
|
||||
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
|
||||
"""
|
||||
Moran's I implementation for PL/Python
|
||||
Andy Eschbacher
|
||||
"""
|
||||
# TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
|
||||
# TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
|
||||
|
||||
plpy.notice('** Constructing query')
|
||||
|
||||
# geometries with attributes that are null are ignored
|
||||
# resulting in a collection of not as near neighbors
|
||||
|
||||
qvals = {"id_col": id_col,
|
||||
"attr1": attr,
|
||||
"geom_col": geom_column,
|
||||
"table": t,
|
||||
"num_ngbrs": num_ngbrs}
|
||||
|
||||
q = get_query(w_type, qvals)
|
||||
|
||||
try:
|
||||
r = plpy.execute(q)
|
||||
plpy.notice('** Query returned with %d rows' % len(r))
|
||||
except plpy.SPIError:
|
||||
plpy.notice('** Query failed: "%s"' % q)
|
||||
plpy.notice('** Exiting function')
|
||||
return zip([None], [None], [None], [None])
|
||||
|
||||
y = get_attributes(r, 1)
|
||||
w = get_weight(r, w_type)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.Moran_Local(y, w)
|
||||
|
||||
# find units of significance
|
||||
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
|
||||
|
||||
plpy.notice('** Finished calculations')
|
||||
|
||||
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
|
||||
|
||||
|
||||
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
|
||||
"""
|
||||
Moran's I Local Rate
|
||||
Andy Eschbacher
|
||||
"""
|
||||
|
||||
plpy.notice('** Constructing query')
|
||||
|
||||
# geometries with attributes that are null are ignored
|
||||
# resulting in a collection of not as near neighbors
|
||||
|
||||
qvals = {"id_col": id_col,
|
||||
"numerator": numerator,
|
||||
"denominator": denominator,
|
||||
"geom_col": geom_column,
|
||||
"table": t,
|
||||
"num_ngbrs": num_ngbrs}
|
||||
|
||||
q = get_query(w_type, qvals)
|
||||
|
||||
try:
|
||||
r = plpy.execute(q)
|
||||
plpy.notice('** Query returned with %d rows' % len(r))
|
||||
except plpy.SPIError:
|
||||
plpy.notice('** Query failed: "%s"' % q)
|
||||
plpy.notice('** Error: %s' % plpy.SPIError)
|
||||
plpy.notice('** Exiting function')
|
||||
return zip([None], [None], [None], [None])
|
||||
|
||||
plpy.notice('r.nrows() = %d' % r.nrows())
|
||||
|
||||
## collect attributes
|
||||
numer = get_attributes(r, 1)
|
||||
denom = get_attributes(r, 2)
|
||||
|
||||
w = get_weight(r, w_type, num_ngbrs)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
|
||||
|
||||
# find units of significance
|
||||
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
|
||||
|
||||
plpy.notice('** Finished calculations')
|
||||
|
||||
## TODO: Decide on which return values here
|
||||
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
|
||||
|
||||
def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
|
||||
plpy.notice('** Constructing query')
|
||||
|
||||
qvals = {"num_ngbrs": num_ngbrs,
|
||||
"attr1": attr1,
|
||||
"attr2": attr2,
|
||||
"table": t,
|
||||
"geom_col": geom_column,
|
||||
"id_col": id_col}
|
||||
|
||||
q = get_query(w_type, qvals)
|
||||
|
||||
try:
|
||||
r = plpy.execute(q)
|
||||
plpy.notice('** Query returned with %d rows' % len(r))
|
||||
except plpy.SPIError:
|
||||
plpy.notice('** Query failed: "%s"' % q)
|
||||
plpy.notice('** Error: %s' % plpy.SPIError)
|
||||
plpy.notice('** Exiting function')
|
||||
return zip([None], [None], [None], [None])
|
||||
|
||||
## collect attributes
|
||||
attr1_vals = get_attributes(r, 1)
|
||||
attr2_vals = get_attributes(r, 2)
|
||||
|
||||
# create weights
|
||||
w = get_weight(r, w_type, num_ngbrs)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
|
||||
|
||||
plpy.notice("len of Is: %d" % len(lisa.Is))
|
||||
|
||||
# find clustering of significance
|
||||
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
|
||||
|
||||
plpy.notice('** Finished calculations')
|
||||
|
||||
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
|
||||
|
||||
|
||||
# Low level functions ----------------------------------------
|
||||
|
||||
def map_quads(coord):
|
||||
"""
|
||||
Map a quadrant number to Moran's I designation
|
||||
HH=1, LH=2, LL=3, HL=4
|
||||
Input:
|
||||
:param coord (int): quadrant of a specific measurement
|
||||
"""
|
||||
if coord == 1:
|
||||
return 'HH'
|
||||
elif coord == 2:
|
||||
return 'LH'
|
||||
elif coord == 3:
|
||||
return 'LL'
|
||||
elif coord == 4:
|
||||
return 'HL'
|
||||
else:
|
||||
return None
|
||||
|
||||
def query_attr_select(params):
|
||||
"""
|
||||
Create portion of SELECT statement for attributes inolved in query.
|
||||
:param params: dict of information used in query (column names,
|
||||
table name, etc.)
|
||||
"""
|
||||
|
||||
attrs = [k for k in params
|
||||
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
|
||||
|
||||
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
|
||||
|
||||
attr_string = ""
|
||||
|
||||
for idx, val in enumerate(sorted(attrs)):
|
||||
attr_string += template % {"col": val, "alias_num": idx + 1}
|
||||
|
||||
return attr_string
|
||||
|
||||
def query_attr_where(params):
|
||||
"""
|
||||
Create portion of WHERE clauses for weeding out NULL-valued geometries
|
||||
"""
|
||||
attrs = sorted([k for k in params
|
||||
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
|
||||
|
||||
attr_string = []
|
||||
|
||||
for attr in attrs:
|
||||
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
|
||||
|
||||
if len(attrs) == 2:
|
||||
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
|
||||
|
||||
out = " AND ".join(attr_string)
|
||||
|
||||
return out
|
||||
|
||||
def knn(params):
|
||||
"""SQL query for k-nearest neighbors.
|
||||
:param vars: dict of values to fill template
|
||||
"""
|
||||
|
||||
attr_select = query_attr_select(params)
|
||||
attr_where = query_attr_where(params)
|
||||
|
||||
replacements = {"attr_select": attr_select,
|
||||
"attr_where_i": attr_where.replace("idx_replace", "i"),
|
||||
"attr_where_j": attr_where.replace("idx_replace", "j")}
|
||||
|
||||
query = "SELECT " \
|
||||
"i.\"{id_col}\" As id, " \
|
||||
"%(attr_select)s" \
|
||||
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
|
||||
"FROM \"{table}\" As j " \
|
||||
"WHERE %(attr_where_j)s " \
|
||||
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
|
||||
"LIMIT {num_ngbrs} OFFSET 1 ) " \
|
||||
") As neighbors " \
|
||||
"FROM \"{table}\" As i " \
|
||||
"WHERE " \
|
||||
"%(attr_where_i)s " \
|
||||
"ORDER BY i.\"{id_col}\" ASC;" % replacements
|
||||
|
||||
return query.format(**params)
|
||||
|
||||
## SQL query for finding queens neighbors (all contiguous polygons)
|
||||
def queen(params):
|
||||
"""SQL query for queen neighbors.
|
||||
:param params: dict of information to fill query
|
||||
"""
|
||||
attr_select = query_attr_select(params)
|
||||
attr_where = query_attr_where(params)
|
||||
|
||||
replacements = {"attr_select": attr_select,
|
||||
"attr_where_i": attr_where.replace("idx_replace", "i"),
|
||||
"attr_where_j": attr_where.replace("idx_replace", "j")}
|
||||
|
||||
query = "SELECT " \
|
||||
"i.\"{id_col}\" As id, " \
|
||||
"%(attr_select)s" \
|
||||
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
|
||||
"FROM \"{table}\" As j " \
|
||||
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
|
||||
"%(attr_where_j)s)" \
|
||||
") As neighbors " \
|
||||
"FROM \"{table}\" As i " \
|
||||
"WHERE " \
|
||||
"%(attr_where_i)s " \
|
||||
"ORDER BY i.\"{id_col}\" ASC;" % replacements
|
||||
|
||||
return query.format(**params)
|
||||
|
||||
## to add more weight methods open a ticket or pull request
|
||||
|
||||
def get_query(w_type, query_vals):
|
||||
"""Return requested query.
|
||||
:param w_type: type of neighbors to calculate (knn or queen)
|
||||
:param query_vals: values used to construct the query
|
||||
"""
|
||||
|
||||
if w_type == 'knn':
|
||||
return knn(query_vals)
|
||||
else:
|
||||
return queen(query_vals)
|
||||
|
||||
def get_attributes(query_res, attr_num):
|
||||
"""
|
||||
:param query_res: query results with attributes and neighbors
|
||||
:param attr_num: attribute number (1, 2, ...)
|
||||
"""
|
||||
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
|
||||
|
||||
## Build weight object
|
||||
def get_weight(query_res, w_type='queen', num_ngbrs=5):
|
||||
"""
|
||||
Construct PySAL weight from return value of query
|
||||
:param query_res: query results with attributes and neighbors
|
||||
"""
|
||||
if w_type == 'knn':
|
||||
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
|
||||
weights = {x['id']: row_normed_weights for x in query_res}
|
||||
elif w_type == 'queen':
|
||||
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
|
||||
if len(x['neighbors']) > 0
|
||||
else [] for x in query_res}
|
||||
|
||||
neighbors = {x['id']: x['neighbors'] for x in query_res}
|
||||
|
||||
return ps.W(neighbors, weights)
|
||||
|
||||
def quad_position(quads):
|
||||
"""
|
||||
Produce Moran's I classification based of n
|
||||
"""
|
||||
|
||||
lisa_sig = np.array([map_quads(q) for q in quads])
|
||||
|
||||
return lisa_sig
|
||||
|
||||
def lisa_sig_vals(pvals, quads, threshold):
|
||||
"""
|
||||
Produce Moran's I classification based of n
|
||||
"""
|
||||
|
||||
sig = (pvals <= threshold)
|
||||
|
||||
lisa_sig = np.empty(len(sig), np.chararray)
|
||||
|
||||
for idx, val in enumerate(sig):
|
||||
if val:
|
||||
lisa_sig[idx] = map_quads(quads[idx])
|
||||
else:
|
||||
lisa_sig[idx] = 'Not significant'
|
||||
|
||||
return lisa_sig
|
||||
10
release/python/0.0.2/crankshaft/crankshaft/random_seeds.py
Normal file
10
release/python/0.0.2/crankshaft/crankshaft/random_seeds.py
Normal file
@@ -0,0 +1,10 @@
|
||||
import random
|
||||
import numpy
|
||||
|
||||
def set_random_seeds(value):
|
||||
"""
|
||||
Set the seeds of the RNGs (Random Number Generators)
|
||||
used internally.
|
||||
"""
|
||||
random.seed(value)
|
||||
numpy.random.seed(value)
|
||||
48
release/python/0.0.2/crankshaft/setup.py
Normal file
48
release/python/0.0.2/crankshaft/setup.py
Normal file
@@ -0,0 +1,48 @@
|
||||
|
||||
"""
|
||||
CartoDB Spatial Analysis Python Library
|
||||
See:
|
||||
https://github.com/CartoDB/crankshaft
|
||||
"""
|
||||
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
setup(
|
||||
name='crankshaft',
|
||||
|
||||
version='0.0.2',
|
||||
|
||||
description='CartoDB Spatial Analysis Python Library',
|
||||
|
||||
url='https://github.com/CartoDB/crankshaft',
|
||||
|
||||
author='Data Services Team - CartoDB',
|
||||
author_email='dataservices@cartodb.com',
|
||||
|
||||
license='MIT',
|
||||
|
||||
classifiers=[
|
||||
'Development Status :: 3 - Alpha',
|
||||
'Intended Audience :: Mapping comunity',
|
||||
'Topic :: Maps :: Mapping Tools',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Programming Language :: Python :: 2.7',
|
||||
],
|
||||
|
||||
keywords='maps mapping tools spatial analysis geostatistics',
|
||||
|
||||
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
|
||||
|
||||
extras_require={
|
||||
'dev': ['unittest'],
|
||||
'test': ['unittest', 'nose', 'mock'],
|
||||
},
|
||||
|
||||
# The choice of component versions is dictated by what's
|
||||
# provisioned in the production servers.
|
||||
install_requires=['pysal==1.9.1'],
|
||||
|
||||
requires=['pysal', 'numpy' ],
|
||||
|
||||
test_suite='test'
|
||||
)
|
||||
52
release/python/0.0.2/crankshaft/test/fixtures/moran.json
vendored
Normal file
52
release/python/0.0.2/crankshaft/test/fixtures/moran.json
vendored
Normal file
@@ -0,0 +1,52 @@
|
||||
[[0.9319096128346788, "HH"],
|
||||
[-1.135787401862846, "HL"],
|
||||
[0.11732030672508517, "Not significant"],
|
||||
[0.6152779669180425, "Not significant"],
|
||||
[-0.14657336660125297, "Not significant"],
|
||||
[0.6967858120189607, "Not significant"],
|
||||
[0.07949310115714454, "Not significant"],
|
||||
[0.4703198759258987, "Not significant"],
|
||||
[0.4421125200498064, "Not significant"],
|
||||
[0.5724288737143592, "Not significant"],
|
||||
[0.8970743435692062, "LL"],
|
||||
[0.18327334401918674, "Not significant"],
|
||||
[-0.01466729201304962, "Not significant"],
|
||||
[0.3481559372544409, "Not significant"],
|
||||
[0.06547094736902978, "Not significant"],
|
||||
[0.15482141569329988, "HH"],
|
||||
[0.4373841193538136, "Not significant"],
|
||||
[0.15971286468915544, "Not significant"],
|
||||
[1.0543588860308968, "Not significant"],
|
||||
[1.7372866900020818, "HH"],
|
||||
[1.091998586053999, "LL"],
|
||||
[0.1171572584252222, "Not significant"],
|
||||
[0.08438455015300014, "Not significant"],
|
||||
[0.06547094736902978, "Not significant"],
|
||||
[0.15482141569329985, "HH"],
|
||||
[1.1627044812890683, "HH"],
|
||||
[0.06547094736902978, "Not significant"],
|
||||
[0.795275137550483, "Not significant"],
|
||||
[0.18562939195219, "LL"],
|
||||
[0.3010757406693439, "Not significant"],
|
||||
[2.8205795942839376, "HH"],
|
||||
[0.11259190602909264, "Not significant"],
|
||||
[-0.07116352791516614, "Not significant"],
|
||||
[-0.09945240794119009, "Not significant"],
|
||||
[0.18562939195219, "LL"],
|
||||
[0.1832733440191868, "Not significant"],
|
||||
[-0.39054253768447705, "Not significant"],
|
||||
[-0.1672071289487642, "HL"],
|
||||
[0.3337669247916343, "Not significant"],
|
||||
[0.2584386102554792, "Not significant"],
|
||||
[-0.19733845476322634, "HL"],
|
||||
[-0.9379282899805409, "LH"],
|
||||
[-0.028770969951095866, "Not significant"],
|
||||
[0.051367269430983485, "Not significant"],
|
||||
[-0.2172548045913472, "LH"],
|
||||
[0.05136726943098351, "Not significant"],
|
||||
[0.04191046803899837, "Not significant"],
|
||||
[0.7482357030403517, "HH"],
|
||||
[-0.014585767863118111, "Not significant"],
|
||||
[0.5410013139159929, "Not significant"],
|
||||
[1.0223932668429925, "LL"],
|
||||
[1.4179402898927476, "LL"]]
|
||||
54
release/python/0.0.2/crankshaft/test/fixtures/neighbors.json
vendored
Normal file
54
release/python/0.0.2/crankshaft/test/fixtures/neighbors.json
vendored
Normal file
@@ -0,0 +1,54 @@
|
||||
[
|
||||
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
|
||||
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
|
||||
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
|
||||
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
|
||||
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
|
||||
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
|
||||
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
|
||||
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
|
||||
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
|
||||
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
|
||||
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
|
||||
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
|
||||
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
|
||||
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
|
||||
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
|
||||
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
|
||||
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
|
||||
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
|
||||
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
|
||||
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
|
||||
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
|
||||
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
|
||||
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
|
||||
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
|
||||
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
|
||||
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
|
||||
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
|
||||
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
|
||||
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
|
||||
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
|
||||
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
|
||||
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
|
||||
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
|
||||
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
|
||||
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
|
||||
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
|
||||
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
|
||||
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
|
||||
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
|
||||
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
|
||||
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
|
||||
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
|
||||
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
|
||||
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
|
||||
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
|
||||
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
|
||||
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
|
||||
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
|
||||
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
|
||||
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
|
||||
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
|
||||
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
|
||||
]
|
||||
13
release/python/0.0.2/crankshaft/test/helper.py
Normal file
13
release/python/0.0.2/crankshaft/test/helper.py
Normal file
@@ -0,0 +1,13 @@
|
||||
import unittest
|
||||
|
||||
from mock_plpy import MockPlPy
|
||||
plpy = MockPlPy()
|
||||
|
||||
import sys
|
||||
sys.modules['plpy'] = plpy
|
||||
|
||||
import os
|
||||
|
||||
def fixture_file(name):
|
||||
dir = os.path.dirname(os.path.realpath(__file__))
|
||||
return os.path.join(dir, 'fixtures', name)
|
||||
34
release/python/0.0.2/crankshaft/test/mock_plpy.py
Normal file
34
release/python/0.0.2/crankshaft/test/mock_plpy.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import re
|
||||
|
||||
class MockPlPy:
|
||||
def __init__(self):
|
||||
self._reset()
|
||||
|
||||
def _reset(self):
|
||||
self.infos = []
|
||||
self.notices = []
|
||||
self.debugs = []
|
||||
self.logs = []
|
||||
self.warnings = []
|
||||
self.errors = []
|
||||
self.fatals = []
|
||||
self.executes = []
|
||||
self.results = []
|
||||
self.prepares = []
|
||||
self.results = []
|
||||
|
||||
def _define_result(self, query, result):
|
||||
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
|
||||
self.results.append([pattern, result])
|
||||
|
||||
def notice(self, msg):
|
||||
self.notices.append(msg)
|
||||
|
||||
def info(self, msg):
|
||||
self.infos.append(msg)
|
||||
|
||||
def execute(self, query): # TODO: additional arguments
|
||||
for result in self.results:
|
||||
if result[0].match(query):
|
||||
return result[1]
|
||||
return []
|
||||
144
release/python/0.0.2/crankshaft/test/test_clustering_moran.py
Normal file
144
release/python/0.0.2/crankshaft/test/test_clustering_moran.py
Normal file
@@ -0,0 +1,144 @@
|
||||
import unittest
|
||||
import numpy as np
|
||||
|
||||
import unittest
|
||||
|
||||
|
||||
# from mock_plpy import MockPlPy
|
||||
# plpy = MockPlPy()
|
||||
#
|
||||
# import sys
|
||||
# sys.modules['plpy'] = plpy
|
||||
from helper import plpy, fixture_file
|
||||
|
||||
import crankshaft.clustering as cc
|
||||
from crankshaft import random_seeds
|
||||
import json
|
||||
|
||||
class MoranTest(unittest.TestCase):
|
||||
"""Testing class for Moran's I functions."""
|
||||
|
||||
def setUp(self):
|
||||
plpy._reset()
|
||||
self.params = {"id_col": "cartodb_id",
|
||||
"attr1": "andy",
|
||||
"attr2": "jay_z",
|
||||
"table": "a_list",
|
||||
"geom_col": "the_geom",
|
||||
"num_ngbrs": 321}
|
||||
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
|
||||
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
|
||||
|
||||
def test_map_quads(self):
|
||||
"""Test map_quads."""
|
||||
self.assertEqual(cc.map_quads(1), 'HH')
|
||||
self.assertEqual(cc.map_quads(2), 'LH')
|
||||
self.assertEqual(cc.map_quads(3), 'LL')
|
||||
self.assertEqual(cc.map_quads(4), 'HL')
|
||||
self.assertEqual(cc.map_quads(33), None)
|
||||
self.assertEqual(cc.map_quads('andy'), None)
|
||||
|
||||
def test_query_attr_select(self):
|
||||
"""Test query_attr_select."""
|
||||
|
||||
ans = "i.\"{attr1}\"::numeric As attr1, " \
|
||||
"i.\"{attr2}\"::numeric As attr2, "
|
||||
|
||||
self.assertEqual(cc.query_attr_select(self.params), ans)
|
||||
|
||||
def test_query_attr_where(self):
|
||||
"""Test query_attr_where."""
|
||||
|
||||
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
|
||||
"idx_replace.\"{attr2}\" IS NOT NULL AND "\
|
||||
"idx_replace.\"{attr2}\" <> 0"
|
||||
|
||||
self.assertEqual(cc.query_attr_where(self.params), ans)
|
||||
|
||||
def test_knn(self):
|
||||
"""Test knn function."""
|
||||
|
||||
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
|
||||
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
|
||||
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
|
||||
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
|
||||
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
|
||||
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
|
||||
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
|
||||
"BY i.\"cartodb_id\" ASC;"
|
||||
|
||||
self.assertEqual(cc.knn(self.params), ans)
|
||||
|
||||
def test_queen(self):
|
||||
"""Test queen neighbors function."""
|
||||
|
||||
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
|
||||
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
|
||||
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
|
||||
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
|
||||
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
|
||||
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
|
||||
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
|
||||
"i.\"cartodb_id\" ASC;"
|
||||
|
||||
self.assertEqual(cc.queen(self.params), ans)
|
||||
|
||||
def test_get_query(self):
|
||||
"""Test get_query."""
|
||||
|
||||
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
|
||||
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
|
||||
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
|
||||
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
|
||||
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
|
||||
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
|
||||
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
|
||||
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
|
||||
|
||||
self.assertEqual(cc.get_query('knn', self.params), ans)
|
||||
|
||||
def test_get_attributes(self):
|
||||
"""Test get_attributes."""
|
||||
|
||||
## need to add tests
|
||||
|
||||
self.assertEqual(True, True)
|
||||
|
||||
def test_get_weight(self):
|
||||
"""Test get_weight."""
|
||||
|
||||
self.assertEqual(True, True)
|
||||
|
||||
|
||||
def test_quad_position(self):
|
||||
"""Test lisa_sig_vals."""
|
||||
|
||||
quads = np.array([1, 2, 3, 4], np.int)
|
||||
|
||||
ans = np.array(['HH', 'LH', 'LL', 'HL'])
|
||||
test_ans = cc.quad_position(quads)
|
||||
|
||||
self.assertTrue((test_ans == ans).all())
|
||||
|
||||
def test_moran_local(self):
|
||||
"""Test Moran's I local"""
|
||||
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
|
||||
plpy._define_result('select', data)
|
||||
random_seeds.set_random_seeds(1234)
|
||||
result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
|
||||
result = [(row[0], row[1]) for row in result]
|
||||
expected = self.moran_data
|
||||
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
|
||||
self.assertAlmostEqual(res_val, exp_val)
|
||||
self.assertEqual(res_quad, exp_quad)
|
||||
|
||||
def test_moran_local_rate(self):
|
||||
"""Test Moran's I rate"""
|
||||
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
|
||||
plpy._define_result('select', data)
|
||||
random_seeds.set_random_seeds(1234)
|
||||
result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
|
||||
result = [(row[0], row[1]) for row in result]
|
||||
expected = self.moran_data
|
||||
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
|
||||
self.assertAlmostEqual(res_val, exp_val)
|
||||
6
src/pg/.gitignore
vendored
Normal file
6
src/pg/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
regression.diffs
|
||||
regression.out
|
||||
results/
|
||||
crankshaft--dev.sql
|
||||
crankshaft--dev--current.sql
|
||||
crankshaft--current--dev.sql
|
||||
60
src/pg/Makefile
Normal file
60
src/pg/Makefile
Normal file
@@ -0,0 +1,60 @@
|
||||
include ../../Makefile.global
|
||||
|
||||
# Development tasks:
|
||||
#
|
||||
# * install generates the control & script files into src/pg/
|
||||
# and installs then into the PostgreSQL extensions directory;
|
||||
# requires sudo. In additionof the current development version
|
||||
# named 'dev', an alias 'current' is generating for ease of
|
||||
# update (upgrade to 'current', then to 'dev').
|
||||
# the python module is installed in a virtualenv in envs/dev/
|
||||
# * test runs the tests for the currently generated Development
|
||||
# extension.
|
||||
|
||||
DATA = $(EXTENSION)--dev.sql \
|
||||
$(EXTENSION)--current--dev.sql \
|
||||
$(EXTENSION)--dev--current.sql
|
||||
|
||||
SOURCES_DATA_DIR = sql
|
||||
SOURCES_DATA = $(wildcard $(SOURCES_DATA_DIR)/*.sql)
|
||||
|
||||
VIRTUALENV_PATH = $(realpath ../../envs)
|
||||
ESC_VIRVIRTUALENV_PATH = $(subst /,\/,$(VIRTUALENV_PATH))
|
||||
|
||||
REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g' \
|
||||
-e 's/@@VIRTUALENV_PATH@@/$(ESC_VIRVIRTUALENV_PATH)/g'
|
||||
|
||||
$(DATA): $(SOURCES_DATA)
|
||||
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > $@
|
||||
|
||||
TEST_DIR = test
|
||||
REGRESS = $(notdir $(basename $(wildcard $(TEST_DIR)/sql/*test.sql)))
|
||||
REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
|
||||
|
||||
PG_CONFIG = pg_config
|
||||
PGXS := $(shell $(PG_CONFIG) --pgxs)
|
||||
include $(PGXS)
|
||||
|
||||
# This seems to be needed at least for PG 9.3.11
|
||||
all: $(DATA)
|
||||
|
||||
test: export PGUSER=postgres
|
||||
test: installcheck
|
||||
|
||||
# Release tasks
|
||||
|
||||
../../release/$(EXTENSION).control: $(EXTENSION).control
|
||||
cp $< $@
|
||||
|
||||
# Prepare new release from the currently installed development version,
|
||||
# for the current version X.Y.Z (defined in the control file)
|
||||
# producing the extension script and control files in releases/
|
||||
# and the python package in releases/python/X.Y.Z/crankshaft/
|
||||
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
|
||||
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > ../../release/$(EXTENSION)--$(EXTVERSION).sql
|
||||
|
||||
# Install the current relese into the PostgreSQL extensions directory
|
||||
# and the Python package in a virtual environment envs/X.Y.Z
|
||||
deploy:
|
||||
$(INSTALL_DATA) ../../release/$(EXTENSION).control '$(DESTDIR)$(datadir)/extension/'
|
||||
$(INSTALL_DATA) ../../release/*.sql '$(DESTDIR)$(datadir)/extension/'
|
||||
5
src/pg/crankshaft.control
Normal file
5
src/pg/crankshaft.control
Normal file
@@ -0,0 +1,5 @@
|
||||
comment = 'CartoDB Spatial Analysis extension'
|
||||
default_version = '0.0.2'
|
||||
requires = 'plpythonu, postgis, cartodb'
|
||||
superuser = true
|
||||
schema = cdb_crankshaft
|
||||
12
src/pg/sql/01_version.sql
Normal file
12
src/pg/sql/01_version.sql
Normal file
@@ -0,0 +1,12 @@
|
||||
-- Version number of the extension release
|
||||
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
|
||||
RETURNS text AS $$
|
||||
SELECT '@@VERSION@@'::text;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
|
||||
-- Internal identifier of the installed extension instence
|
||||
-- e.g. 'dev' for current development version
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
|
||||
RETURNS text AS $$
|
||||
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
|
||||
$$ language 'sql' STABLE STRICT;
|
||||
23
src/pg/sql/02_py.sql
Normal file
23
src/pg/sql/02_py.sql
Normal file
@@ -0,0 +1,23 @@
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
|
||||
RETURNS text
|
||||
AS $$
|
||||
BEGIN
|
||||
-- RETURN '/opt/virtualenvs/crankshaft';
|
||||
RETURN '@@VIRTUALENV_PATH@@';
|
||||
END;
|
||||
$$ language plpgsql IMMUTABLE STRICT;
|
||||
|
||||
-- Use the crankshaft python module
|
||||
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
|
||||
RETURNS VOID
|
||||
AS $$
|
||||
import os
|
||||
# plpy.notice('%',str(os.environ))
|
||||
# activate virtualenv
|
||||
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
|
||||
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
|
||||
default_venv_path = os.path.join(base_path, crankshaft_version)
|
||||
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
|
||||
activate_path = venv_path + '/bin/activate_this.py'
|
||||
exec(open(activate_path).read(), dict(__file__=activate_path))
|
||||
$$ LANGUAGE plpythonu;
|
||||
@@ -4,6 +4,7 @@
|
||||
CREATE OR REPLACE FUNCTION
|
||||
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
|
||||
AS $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft import random_seeds
|
||||
random_seeds.set_random_seeds(seed_value)
|
||||
$$ LANGUAGE plpythonu;
|
||||
89
src/pg/sql/10_moran.sql
Normal file
89
src/pg/sql/10_moran.sql
Normal file
@@ -0,0 +1,89 @@
|
||||
-- Moran's I (global)
|
||||
CREATE OR REPLACE FUNCTION
|
||||
CDB_AreasOfInterest_Global (
|
||||
subquery TEXT,
|
||||
attr_name TEXT,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_col TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn',
|
||||
num_ngbrs INT DEFAULT 5)
|
||||
RETURNS TABLE (moran NUMERIC, significance NUMERIC)
|
||||
AS $$
|
||||
from crankshaft.clustering import moran_local
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran(subquery, attr, num_ngbrs, permutations, geom_col, id_col, w_type)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
-- Moran's I Local
|
||||
CREATE OR REPLACE FUNCTION
|
||||
CDB_AreasOfInterest_Local(
|
||||
subquery TEXT,
|
||||
attr TEXT,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_col TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn',
|
||||
num_ngbrs INT DEFAULT 5)
|
||||
RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
|
||||
AS $$
|
||||
from crankshaft.clustering import moran_local
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local(subquery, attr, permutations, geom_col, id_col, w_type, num_ngbrs)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
-- Moran's I Rate (global)
|
||||
CREATE OR REPLACE FUNCTION
|
||||
CDB_AreasOfInterest_Global_Rate(
|
||||
subquery TEXT,
|
||||
numerator TEXT,
|
||||
denominator TEXT,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_col TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn',
|
||||
num_ngbrs INT DEFAULT 5)
|
||||
RETURNS TABLE (moran FLOAT, significance FLOAT)
|
||||
AS $$
|
||||
from crankshaft.clustering import moran_local
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
|
||||
-- Moran's I Local Rate
|
||||
CREATE OR REPLACE FUNCTION
|
||||
CDB_AreasOfInterest_Local_Rate(
|
||||
subquery TEXT,
|
||||
numerator TEXT,
|
||||
denominator TEXT,
|
||||
permutations INT DEFAULT 99,
|
||||
geom_col TEXT DEFAULT 'the_geom',
|
||||
id_col TEXT DEFAULT 'cartodb_id',
|
||||
w_type TEXT DEFAULT 'knn',
|
||||
num_ngbrs INT DEFAULT 5)
|
||||
RETURNS
|
||||
TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
|
||||
AS $$
|
||||
from crankshaft.clustering import moran_local_rate
|
||||
# TODO: use named parameters or a dictionary
|
||||
return moran_local_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
-- -- Moran's I Local Bivariate
|
||||
-- CREATE OR REPLACE FUNCTION
|
||||
-- cdb_moran_local_bv(
|
||||
-- subquery TEXT,
|
||||
-- attr1 TEXT,
|
||||
-- attr2 TEXT,
|
||||
-- permutations INT DEFAULT 99,
|
||||
-- geom_col TEXT DEFAULT 'the_geom',
|
||||
-- id_col TEXT DEFAULT 'cartodb_id',
|
||||
-- w_type TEXT DEFAULT 'knn',
|
||||
-- num_ngbrs INT DEFAULT 5)
|
||||
-- RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
|
||||
-- AS $$
|
||||
-- from crankshaft.clustering import moran_local_bv
|
||||
-- # TODO: use named parameters or a dictionary
|
||||
-- return moran_local_bv(t, attr1, attr2, permutations, geom_col, id_col, w_type, num_ngbrs)
|
||||
-- $$ LANGUAGE plpythonu;
|
||||
15
src/pg/sql/80_similarity_rank.sql
Normal file
15
src/pg/sql/80_similarity_rank.sql
Normal file
@@ -0,0 +1,15 @@
|
||||
CREATE OR REPLACE FUNCTION cdb_SimilarityRank(cartodb_id numeric, query text)
|
||||
returns TABLE (cartodb_id NUMERIC, similarity NUMERIC)
|
||||
as $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.similarity import similarity_rank
|
||||
return similarity_rank(cartodb_id, query)
|
||||
$$ LANGUAGE plpythonu;
|
||||
|
||||
CREATE OR REPLACE FUNCTION cdb_MostSimilar(cartodb_id numeric, query text ,matches numeric)
|
||||
returns TABLE (cartodb_id NUMERIC, similarity NUMERIC)
|
||||
as $$
|
||||
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
|
||||
from crankshaft.similarity import most_similar
|
||||
return most_similar(matches, query)
|
||||
$$ LANGUAGE plpythonu;
|
||||
@@ -3,4 +3,4 @@ CREATE EXTENSION plpythonu;
|
||||
CREATE EXTENSION postgis;
|
||||
CREATE EXTENSION cartodb;
|
||||
-- Install the extension
|
||||
CREATE EXTENSION crankshaft;
|
||||
CREATE EXTENSION crankshaft VERSION 'dev';
|
||||
@@ -110,7 +110,7 @@ INSERT INTO ppoints2 VALUES
|
||||
(24,'0101000020E61000009C5F91C5095C17C0C78784B15A4F4540'::geometry,'24','07',0.3, 1.0),
|
||||
(29,'0101000020E6100000C34D4A5B48E712C092E680892C684240'::geometry,'29','01',0.3, 1.0),
|
||||
(52,'0101000020E6100000406A545EB29A07C04E5F0BDA39A54140'::geometry,'52','19',0.0, 1.01)
|
||||
-- Moral functions perform some nondeterministic computations
|
||||
-- Areas of Interest functions perform some nondeterministic computations
|
||||
-- (to estimate the significance); we will set the seeds for the RNGs
|
||||
-- that affect those results to have repeateble results
|
||||
SELECT cdb_crankshaft._cdb_random_seeds(1234);
|
||||
@@ -121,67 +121,61 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
|
||||
|
||||
SELECT ppoints.code, m.quads
|
||||
FROM ppoints
|
||||
JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
|
||||
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
|
||||
ON ppoints.cartodb_id = m.ids
|
||||
ORDER BY ppoints.code;
|
||||
NOTICE: ** Constructing query
|
||||
CONTEXT: PL/Python function "cdb_moran_local"
|
||||
NOTICE: ** Query returned with 52 rows
|
||||
CONTEXT: PL/Python function "cdb_moran_local"
|
||||
NOTICE: ** Finished calculations
|
||||
CONTEXT: PL/Python function "cdb_moran_local"
|
||||
code | quads
|
||||
------+-----------------
|
||||
code | quads
|
||||
------+-------
|
||||
01 | HH
|
||||
02 | HL
|
||||
03 | Not significant
|
||||
04 | Not significant
|
||||
05 | Not significant
|
||||
06 | Not significant
|
||||
07 | Not significant
|
||||
08 | Not significant
|
||||
09 | Not significant
|
||||
10 | Not significant
|
||||
03 | LL
|
||||
04 | LL
|
||||
05 | LH
|
||||
06 | LL
|
||||
07 | HH
|
||||
08 | HH
|
||||
09 | HH
|
||||
10 | LL
|
||||
11 | LL
|
||||
12 | Not significant
|
||||
13 | Not significant
|
||||
14 | Not significant
|
||||
15 | Not significant
|
||||
12 | LL
|
||||
13 | HL
|
||||
14 | LL
|
||||
15 | LL
|
||||
16 | HH
|
||||
17 | Not significant
|
||||
18 | Not significant
|
||||
19 | Not significant
|
||||
17 | HH
|
||||
18 | LL
|
||||
19 | HH
|
||||
20 | HH
|
||||
21 | LL
|
||||
22 | Not significant
|
||||
23 | Not significant
|
||||
24 | Not significant
|
||||
22 | HH
|
||||
23 | LL
|
||||
24 | LL
|
||||
25 | HH
|
||||
26 | HH
|
||||
27 | Not significant
|
||||
28 | Not significant
|
||||
27 | LL
|
||||
28 | HH
|
||||
29 | LL
|
||||
30 | Not significant
|
||||
30 | LL
|
||||
31 | HH
|
||||
32 | Not significant
|
||||
33 | Not significant
|
||||
34 | Not significant
|
||||
32 | LL
|
||||
33 | HL
|
||||
34 | LH
|
||||
35 | LL
|
||||
36 | Not significant
|
||||
37 | Not significant
|
||||
36 | LL
|
||||
37 | HL
|
||||
38 | HL
|
||||
39 | Not significant
|
||||
40 | Not significant
|
||||
39 | HH
|
||||
40 | HH
|
||||
41 | HL
|
||||
42 | LH
|
||||
43 | Not significant
|
||||
44 | Not significant
|
||||
43 | LH
|
||||
44 | LL
|
||||
45 | LH
|
||||
46 | Not significant
|
||||
47 | Not significant
|
||||
46 | LL
|
||||
47 | LL
|
||||
48 | HH
|
||||
49 | Not significant
|
||||
50 | Not significant
|
||||
49 | LH
|
||||
50 | HH
|
||||
51 | LL
|
||||
52 | LL
|
||||
(52 rows)
|
||||
@@ -194,67 +188,61 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
|
||||
|
||||
SELECT ppoints2.code, m.quads
|
||||
FROM ppoints2
|
||||
JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
|
||||
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
|
||||
ON ppoints2.cartodb_id = m.ids
|
||||
ORDER BY ppoints2.code;
|
||||
NOTICE: ** Constructing query
|
||||
CONTEXT: PL/Python function "cdb_moran_local_rate"
|
||||
NOTICE: ** Query returned with 51 rows
|
||||
CONTEXT: PL/Python function "cdb_moran_local_rate"
|
||||
NOTICE: ** Finished calculations
|
||||
CONTEXT: PL/Python function "cdb_moran_local_rate"
|
||||
code | quads
|
||||
------+-----------------
|
||||
code | quads
|
||||
------+-------
|
||||
01 | LL
|
||||
02 | Not significant
|
||||
03 | Not significant
|
||||
04 | Not significant
|
||||
05 | Not significant
|
||||
06 | Not significant
|
||||
07 | Not significant
|
||||
08 | Not significant
|
||||
02 | LH
|
||||
03 | HH
|
||||
04 | HH
|
||||
05 | LL
|
||||
06 | HH
|
||||
07 | LL
|
||||
08 | LL
|
||||
09 | LL
|
||||
10 | Not significant
|
||||
10 | HH
|
||||
11 | HH
|
||||
12 | Not significant
|
||||
13 | Not significant
|
||||
14 | Not significant
|
||||
15 | Not significant
|
||||
16 | Not significant
|
||||
12 | HL
|
||||
13 | LL
|
||||
14 | HH
|
||||
15 | LL
|
||||
16 | LL
|
||||
17 | LL
|
||||
18 | Not significant
|
||||
19 | Not significant
|
||||
18 | LH
|
||||
19 | LL
|
||||
20 | LL
|
||||
21 | Not significant
|
||||
22 | Not significant
|
||||
23 | Not significant
|
||||
24 | Not significant
|
||||
21 | HH
|
||||
22 | LL
|
||||
23 | HL
|
||||
24 | LL
|
||||
25 | LL
|
||||
26 | LL
|
||||
27 | Not significant
|
||||
28 | Not significant
|
||||
27 | LL
|
||||
28 | LL
|
||||
29 | LH
|
||||
30 | Not significant
|
||||
30 | HH
|
||||
31 | LL
|
||||
32 | Not significant
|
||||
33 | Not significant
|
||||
34 | Not significant
|
||||
32 | LL
|
||||
33 | LL
|
||||
34 | LL
|
||||
35 | LH
|
||||
36 | Not significant
|
||||
37 | Not significant
|
||||
36 | HL
|
||||
37 | LH
|
||||
38 | LH
|
||||
39 | Not significant
|
||||
40 | Not significant
|
||||
39 | LL
|
||||
40 | LL
|
||||
41 | LH
|
||||
42 | HL
|
||||
43 | Not significant
|
||||
44 | Not significant
|
||||
43 | LL
|
||||
44 | HL
|
||||
45 | LL
|
||||
46 | Not significant
|
||||
47 | Not significant
|
||||
46 | HL
|
||||
47 | LL
|
||||
48 | LL
|
||||
49 | Not significant
|
||||
50 | Not significant
|
||||
51 | Not significant
|
||||
49 | HL
|
||||
50 | LL
|
||||
51 | HH
|
||||
(51 rows)
|
||||
|
||||
@@ -4,4 +4,4 @@ CREATE EXTENSION postgis;
|
||||
CREATE EXTENSION cartodb;
|
||||
|
||||
-- Install the extension
|
||||
CREATE EXTENSION crankshaft;
|
||||
CREATE EXTENSION crankshaft VERSION 'dev';
|
||||
@@ -1,14 +1,14 @@
|
||||
\i test/fixtures/ppoints.sql
|
||||
\i test/fixtures/ppoints2.sql
|
||||
|
||||
-- Moral functions perform some nondeterministic computations
|
||||
-- Areas of Interest functions perform some nondeterministic computations
|
||||
-- (to estimate the significance); we will set the seeds for the RNGs
|
||||
-- that affect those results to have repeateble results
|
||||
SELECT cdb_crankshaft._cdb_random_seeds(1234);
|
||||
|
||||
SELECT ppoints.code, m.quads
|
||||
FROM ppoints
|
||||
JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
|
||||
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
|
||||
ON ppoints.cartodb_id = m.ids
|
||||
ORDER BY ppoints.code;
|
||||
|
||||
@@ -16,6 +16,6 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
|
||||
|
||||
SELECT ppoints2.code, m.quads
|
||||
FROM ppoints2
|
||||
JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
|
||||
JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
|
||||
ON ppoints2.cartodb_id = m.ids
|
||||
ORDER BY ppoints2.code;
|
||||
@@ -9,7 +9,7 @@ SET search_path TO public,cartodb,cdb_crankshaft;
|
||||
-- Exercise public functions
|
||||
SELECT ppoints.code, m.quads
|
||||
FROM ppoints
|
||||
JOIN cdb_moran_local('ppoints', 'value') m
|
||||
JOIN CDB_AreasOfInterest_Local('ppoints', 'value') m
|
||||
ON ppoints.cartodb_id = m.ids
|
||||
ORDER BY ppoints.code;
|
||||
SELECT round(cdb_overlap_sum(
|
||||
22
src/py/Makefile
Normal file
22
src/py/Makefile
Normal file
@@ -0,0 +1,22 @@
|
||||
include ../../Makefile.global
|
||||
|
||||
# Install the package locally for development
|
||||
install:
|
||||
virtualenv --system-site-packages ../../envs/dev
|
||||
# source ../../envs/dev/bin/activate
|
||||
../../envs/dev/bin/pip install -I ./crankshaft
|
||||
../../envs/dev/bin/pip install -I nose
|
||||
|
||||
# Test develpment install
|
||||
test:
|
||||
../../envs/dev/bin/nosetests crankshaft/test/
|
||||
|
||||
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
|
||||
mkdir -p ../../release/python/$(EXTVERSION)
|
||||
cp -r ./$(PACKAGE) ../../release/python/$(EXTVERSION)/
|
||||
$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g' ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
|
||||
|
||||
deploy:
|
||||
virtualenv --system-site-packages $(VIRTUALENV_PATH)/$(RELEASE_VERSION)
|
||||
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I -U ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
|
||||
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I nose
|
||||
88
src/py/README.md
Normal file
88
src/py/README.md
Normal file
@@ -0,0 +1,88 @@
|
||||
# Crankshaft Python Package
|
||||
|
||||
...
|
||||
### Run the tests
|
||||
|
||||
```bash
|
||||
cd crankshaft
|
||||
nosetests test/
|
||||
```
|
||||
|
||||
## Notes about Python dependencies
|
||||
* This extension is targeted at production databases. Therefore certain restrictions must be assumed about the production environment vs other experimental environments.
|
||||
* We're using `pip` and `virtualenv` to generate a suitable isolated environment for python code that has all the dependencies
|
||||
* Every dependency should be:
|
||||
- Added to the `setup.py` file
|
||||
- Installed through it
|
||||
- Tested, when they have a test suite.
|
||||
- Fixed in the `requirements.txt`
|
||||
* At present we use Python version 2.7.3
|
||||
|
||||
---
|
||||
|
||||
To avoid troublesome compilations/linkings we will use
|
||||
the available system package `python-scipy`.
|
||||
This package and its dependencies provide numpy 1.6.1
|
||||
and scipy 0.9.0. To be able to use these versions we cannot
|
||||
PySAL 1.10 or later, so we'll stick to 1.9.1.
|
||||
|
||||
```
|
||||
apt-get install -y python-scipy
|
||||
```
|
||||
|
||||
We'll use virtual environments to install our packages,
|
||||
but configued to use also system modules so that the
|
||||
mentioned scipy and numpy are used.
|
||||
|
||||
# Create a virtual environment for python
|
||||
$ virtualenv --system-site-packages dev
|
||||
|
||||
# Activate the virtualenv
|
||||
$ source dev/bin/activate
|
||||
|
||||
# Install all the requirements
|
||||
# expect this to take a while, as it will trigger a few compilations
|
||||
(dev) $ pip install -I ./crankshaft
|
||||
|
||||
#### Test the libraries with that virtual env
|
||||
|
||||
##### Test numpy library dependency:
|
||||
|
||||
import numpy
|
||||
numpy.test('full')
|
||||
|
||||
##### Run scipy tests
|
||||
|
||||
import scipy
|
||||
scipy.test('full')
|
||||
|
||||
##### Testing pysal
|
||||
|
||||
See [http://pysal.readthedocs.org/en/latest/developers/testing.html]
|
||||
|
||||
This will require putting this into `dev/lib/python2.7/site-packages/setup.cfg`:
|
||||
|
||||
```
|
||||
[nosetests]
|
||||
ignore-files=collection
|
||||
exclude-dir=pysal/contrib
|
||||
|
||||
[wheel]
|
||||
universal=1
|
||||
```
|
||||
|
||||
And copying some files before executing the tests:
|
||||
(we'll use a temporary directory from where the tests will be executed because
|
||||
some tests expect some files in the current directory). Next must be executed
|
||||
from
|
||||
|
||||
```
|
||||
cp dev/lib/python2.7/site-packages/pysal/examples/geodanet/* dev/local/lib/python2.7/site-packages/pysal/examples
|
||||
mkdir -p test_tmp && cd test_tmp && cp ../dev/lib/python2.7/site-packages/pysal/examples/geodanet/* ./
|
||||
```
|
||||
|
||||
Then, execute the tests with:
|
||||
|
||||
import pysal
|
||||
import nose
|
||||
nose.runmodule('pysal')
|
||||
3
src/py/crankshaft/crankshaft/__init__.py
Normal file
3
src/py/crankshaft/crankshaft/__init__.py
Normal file
@@ -0,0 +1,3 @@
|
||||
import random_seeds
|
||||
import clustering
|
||||
import similarity
|
||||
1
src/py/crankshaft/crankshaft/clustering/__init__.py
Normal file
1
src/py/crankshaft/crankshaft/clustering/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from moran import *
|
||||
260
src/py/crankshaft/crankshaft/clustering/moran.py
Normal file
260
src/py/crankshaft/crankshaft/clustering/moran.py
Normal file
@@ -0,0 +1,260 @@
|
||||
"""
|
||||
Moran's I geostatistics (global clustering & outliers presence)
|
||||
"""
|
||||
|
||||
# TODO: Fill in local neighbors which have null/NoneType values with the
|
||||
# average of the their neighborhood
|
||||
|
||||
import pysal as ps
|
||||
import plpy
|
||||
|
||||
# crankshaft module
|
||||
import crankshaft.pysal_utils as pu
|
||||
|
||||
# High level interface ---------------------------------------
|
||||
|
||||
def moran(subquery, attr_name,
|
||||
permutations, geom_col, id_col, w_type, num_ngbrs):
|
||||
"""
|
||||
Moran's I (global)
|
||||
Implementation building neighbors with a PostGIS database and Moran's I
|
||||
core clusters with PySAL.
|
||||
Andy Eschbacher
|
||||
"""
|
||||
qvals = {"id_col": id_col,
|
||||
"attr1": attr_name,
|
||||
"geom_col": geom_col,
|
||||
"subquery": subquery,
|
||||
"num_ngbrs": num_ngbrs}
|
||||
|
||||
query = pu.construct_neighbor_query(w_type, qvals)
|
||||
|
||||
plpy.notice('** Query: %s' % query)
|
||||
|
||||
try:
|
||||
result = plpy.execute(query)
|
||||
# if there are no neighbors, exit
|
||||
if len(result) == 0:
|
||||
return pu.empty_zipped_array(2)
|
||||
plpy.notice('** Query returned with %d rows' % len(result))
|
||||
except plpy.SPIError:
|
||||
plpy.error('Error: areas of interest query failed, check input parameters')
|
||||
plpy.notice('** Query failed: "%s"' % query)
|
||||
plpy.notice('** Error: %s' % plpy.SPIError)
|
||||
return pu.empty_zipped_array(2)
|
||||
|
||||
## collect attributes
|
||||
attr_vals = pu.get_attributes(result)
|
||||
|
||||
## calculate weights
|
||||
weight = pu.get_weight(result, w_type, num_ngbrs)
|
||||
|
||||
## calculate moran global
|
||||
moran_global = ps.esda.moran.Moran(attr_vals, weight,
|
||||
permutations=permutations)
|
||||
|
||||
return zip([moran_global.I], [moran_global.EI])
|
||||
|
||||
def moran_local(subquery, attr,
|
||||
permutations, geom_col, id_col, w_type, num_ngbrs):
|
||||
"""
|
||||
Moran's I implementation for PL/Python
|
||||
Andy Eschbacher
|
||||
"""
|
||||
|
||||
# geometries with attributes that are null are ignored
|
||||
# resulting in a collection of not as near neighbors
|
||||
|
||||
qvals = {"id_col": id_col,
|
||||
"attr1": attr,
|
||||
"geom_col": geom_col,
|
||||
"subquery": subquery,
|
||||
"num_ngbrs": num_ngbrs}
|
||||
|
||||
query = pu.construct_neighbor_query(w_type, qvals)
|
||||
|
||||
try:
|
||||
result = plpy.execute(query)
|
||||
# if there are no neighbors, exit
|
||||
if len(result) == 0:
|
||||
return pu.empty_zipped_array(5)
|
||||
except plpy.SPIError:
|
||||
plpy.error('Error: areas of interest query failed, check input parameters')
|
||||
plpy.notice('** Query failed: "%s"' % query)
|
||||
return pu.empty_zipped_array(5)
|
||||
|
||||
attr_vals = pu.get_attributes(result)
|
||||
weight = pu.get_weight(result, w_type, num_ngbrs)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
|
||||
permutations=permutations)
|
||||
|
||||
# find quadrants for each geometry
|
||||
quads = quad_position(lisa.q)
|
||||
|
||||
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
|
||||
|
||||
def moran_rate(subquery, numerator, denominator,
|
||||
permutations, geom_col, id_col, w_type, num_ngbrs):
|
||||
"""
|
||||
Moran's I Rate (global)
|
||||
Andy Eschbacher
|
||||
"""
|
||||
qvals = {"id_col": id_col,
|
||||
"attr1": numerator,
|
||||
"attr2": denominator,
|
||||
"geom_col": geom_col,
|
||||
"subquery": subquery,
|
||||
"num_ngbrs": num_ngbrs}
|
||||
|
||||
query = pu.construct_neighbor_query(w_type, qvals)
|
||||
|
||||
plpy.notice('** Query: %s' % query)
|
||||
|
||||
try:
|
||||
result = plpy.execute(query)
|
||||
# if there are no neighbors, exit
|
||||
if len(result) == 0:
|
||||
return pu.empty_zipped_array(2)
|
||||
plpy.notice('** Query returned with %d rows' % len(result))
|
||||
except plpy.SPIError:
|
||||
plpy.error('Error: areas of interest query failed, check input parameters')
|
||||
plpy.notice('** Query failed: "%s"' % query)
|
||||
plpy.notice('** Error: %s' % plpy.SPIError)
|
||||
return pu.empty_zipped_array(2)
|
||||
|
||||
## collect attributes
|
||||
numer = pu.get_attributes(result, 1)
|
||||
denom = pu.get_attributes(result, 2)
|
||||
|
||||
weight = pu.get_weight(result, w_type, num_ngbrs)
|
||||
|
||||
## calculate moran global rate
|
||||
lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
|
||||
permutations=permutations)
|
||||
|
||||
return zip([lisa_rate.I], [lisa_rate.EI])
|
||||
|
||||
def moran_local_rate(subquery, numerator, denominator,
|
||||
permutations, geom_col, id_col, w_type, num_ngbrs):
|
||||
"""
|
||||
Moran's I Local Rate
|
||||
Andy Eschbacher
|
||||
"""
|
||||
# geometries with values that are null are ignored
|
||||
# resulting in a collection of not as near neighbors
|
||||
|
||||
query = pu.construct_neighbor_query(w_type,
|
||||
{"id_col": id_col,
|
||||
"numerator": numerator,
|
||||
"denominator": denominator,
|
||||
"geom_col": geom_col,
|
||||
"subquery": subquery,
|
||||
"num_ngbrs": num_ngbrs})
|
||||
|
||||
try:
|
||||
result = plpy.execute(query)
|
||||
# if there are no neighbors, exit
|
||||
if len(result) == 0:
|
||||
return pu.empty_zipped_array(5)
|
||||
except plpy.SPIError:
|
||||
plpy.error('Error: areas of interest query failed, check input parameters')
|
||||
plpy.notice('** Query failed: "%s"' % query)
|
||||
plpy.notice('** Error: %s' % plpy.SPIError)
|
||||
return pu.empty_zipped_array(5)
|
||||
|
||||
## collect attributes
|
||||
numer = pu.get_attributes(result, 1)
|
||||
denom = pu.get_attributes(result, 2)
|
||||
|
||||
weight = pu.get_weight(result, w_type, num_ngbrs)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
|
||||
permutations=permutations)
|
||||
|
||||
# find units of significance
|
||||
quads = quad_position(lisa.q)
|
||||
|
||||
return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
|
||||
|
||||
def moran_local_bv(subquery, attr1, attr2,
|
||||
permutations, geom_col, id_col, w_type, num_ngbrs):
|
||||
"""
|
||||
Moran's I (local) Bivariate (untested)
|
||||
"""
|
||||
plpy.notice('** Constructing query')
|
||||
|
||||
qvals = {"num_ngbrs": num_ngbrs,
|
||||
"attr1": attr1,
|
||||
"attr2": attr2,
|
||||
"subquery": subquery,
|
||||
"geom_col": geom_col,
|
||||
"id_col": id_col}
|
||||
|
||||
query = pu.construct_neighbor_query(w_type, qvals)
|
||||
|
||||
try:
|
||||
result = plpy.execute(query)
|
||||
# if there are no neighbors, exit
|
||||
if len(result) == 0:
|
||||
return pu.empty_zipped_array(4)
|
||||
except plpy.SPIError:
|
||||
plpy.error("Error: areas of interest query failed, " \
|
||||
"check input parameters")
|
||||
plpy.notice('** Query failed: "%s"' % query)
|
||||
return pu.empty_zipped_array(4)
|
||||
|
||||
## collect attributes
|
||||
attr1_vals = pu.get_attributes(result, 1)
|
||||
attr2_vals = pu.get_attributes(result, 2)
|
||||
|
||||
# create weights
|
||||
weight = pu.get_weight(result, w_type, num_ngbrs)
|
||||
|
||||
# calculate LISA values
|
||||
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
|
||||
permutations=permutations)
|
||||
|
||||
plpy.notice("len of Is: %d" % len(lisa.Is))
|
||||
|
||||
# find clustering of significance
|
||||
lisa_sig = quad_position(lisa.q)
|
||||
|
||||
plpy.notice('** Finished calculations')
|
||||
|
||||
return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
|
||||
|
||||
# Low level functions ----------------------------------------
|
||||
|
||||
def map_quads(coord):
|
||||
"""
|
||||
Map a quadrant number to Moran's I designation
|
||||
HH=1, LH=2, LL=3, HL=4
|
||||
Input:
|
||||
@param coord (int): quadrant of a specific measurement
|
||||
Output:
|
||||
classification (one of 'HH', 'LH', 'LL', or 'HL')
|
||||
"""
|
||||
if coord == 1:
|
||||
return 'HH'
|
||||
elif coord == 2:
|
||||
return 'LH'
|
||||
elif coord == 3:
|
||||
return 'LL'
|
||||
elif coord == 4:
|
||||
return 'HL'
|
||||
else:
|
||||
return None
|
||||
|
||||
def quad_position(quads):
|
||||
"""
|
||||
Produce Moran's I classification based of n
|
||||
Input:
|
||||
@param quads ndarray: an array of quads classified by
|
||||
1-4 (PySAL default)
|
||||
Output:
|
||||
@param list: an array of quads classied by 'HH', 'LL', etc.
|
||||
"""
|
||||
return [map_quads(q) for q in quads]
|
||||
1
src/py/crankshaft/crankshaft/pysal_utils/__init__.py
Normal file
1
src/py/crankshaft/crankshaft/pysal_utils/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from pysal_utils import *
|
||||
152
src/py/crankshaft/crankshaft/pysal_utils/pysal_utils.py
Normal file
152
src/py/crankshaft/crankshaft/pysal_utils/pysal_utils.py
Normal file
@@ -0,0 +1,152 @@
|
||||
"""
|
||||
Utilities module for generic PySAL functionality, mainly centered on translating queries into numpy arrays or PySAL weights objects
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import pysal as ps
|
||||
|
||||
def construct_neighbor_query(w_type, query_vals):
|
||||
"""Return query (a string) used for finding neighbors
|
||||
@param w_type text: type of neighbors to calculate ('knn' or 'queen')
|
||||
@param query_vals dict: values used to construct the query
|
||||
"""
|
||||
|
||||
if w_type == 'knn':
|
||||
return knn(query_vals)
|
||||
else:
|
||||
return queen(query_vals)
|
||||
|
||||
## Build weight object
|
||||
def get_weight(query_res, w_type='knn', num_ngbrs=5):
|
||||
"""
|
||||
Construct PySAL weight from return value of query
|
||||
@param query_res: query results with attributes and neighbors
|
||||
"""
|
||||
if w_type == 'knn':
|
||||
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
|
||||
weights = {x['id']: row_normed_weights for x in query_res}
|
||||
else:
|
||||
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
|
||||
if len(x['neighbors']) > 0
|
||||
else [] for x in query_res}
|
||||
|
||||
neighbors = {x['id']: x['neighbors'] for x in query_res}
|
||||
|
||||
return ps.W(neighbors, weights)
|
||||
|
||||
def query_attr_select(params):
|
||||
"""
|
||||
Create portion of SELECT statement for attributes inolved in query.
|
||||
@param params: dict of information used in query (column names,
|
||||
table name, etc.)
|
||||
"""
|
||||
|
||||
attrs = [k for k in params
|
||||
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')]
|
||||
|
||||
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
|
||||
|
||||
attr_string = ""
|
||||
|
||||
for idx, val in enumerate(sorted(attrs)):
|
||||
attr_string += template % {"col": val, "alias_num": idx + 1}
|
||||
|
||||
return attr_string
|
||||
|
||||
def query_attr_where(params):
|
||||
"""
|
||||
Create portion of WHERE clauses for weeding out NULL-valued geometries
|
||||
"""
|
||||
attrs = sorted([k for k in params
|
||||
if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')])
|
||||
|
||||
attr_string = []
|
||||
|
||||
for attr in attrs:
|
||||
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
|
||||
|
||||
if len(attrs) == 2:
|
||||
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
|
||||
|
||||
out = " AND ".join(attr_string)
|
||||
|
||||
return out
|
||||
|
||||
def knn(params):
|
||||
"""SQL query for k-nearest neighbors.
|
||||
@param vars: dict of values to fill template
|
||||
"""
|
||||
|
||||
attr_select = query_attr_select(params)
|
||||
attr_where = query_attr_where(params)
|
||||
|
||||
replacements = {"attr_select": attr_select,
|
||||
"attr_where_i": attr_where.replace("idx_replace", "i"),
|
||||
"attr_where_j": attr_where.replace("idx_replace", "j")}
|
||||
|
||||
query = "SELECT " \
|
||||
"i.\"{id_col}\" As id, " \
|
||||
"%(attr_select)s" \
|
||||
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
|
||||
"FROM ({subquery}) As j " \
|
||||
"WHERE " \
|
||||
"i.\"{id_col}\" <> j.\"{id_col}\" AND " \
|
||||
"%(attr_where_j)s " \
|
||||
"ORDER BY " \
|
||||
"j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
|
||||
"LIMIT {num_ngbrs})" \
|
||||
") As neighbors " \
|
||||
"FROM ({subquery}) As i " \
|
||||
"WHERE " \
|
||||
"%(attr_where_i)s " \
|
||||
"ORDER BY i.\"{id_col}\" ASC;" % replacements
|
||||
|
||||
return query.format(**params)
|
||||
|
||||
## SQL query for finding queens neighbors (all contiguous polygons)
|
||||
def queen(params):
|
||||
"""SQL query for queen neighbors.
|
||||
@param params dict: information to fill query
|
||||
"""
|
||||
attr_select = query_attr_select(params)
|
||||
attr_where = query_attr_where(params)
|
||||
|
||||
replacements = {"attr_select": attr_select,
|
||||
"attr_where_i": attr_where.replace("idx_replace", "i"),
|
||||
"attr_where_j": attr_where.replace("idx_replace", "j")}
|
||||
|
||||
query = "SELECT " \
|
||||
"i.\"{id_col}\" As id, " \
|
||||
"%(attr_select)s" \
|
||||
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
|
||||
"FROM ({subquery}) As j " \
|
||||
"WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
|
||||
"ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
|
||||
"%(attr_where_j)s)" \
|
||||
") As neighbors " \
|
||||
"FROM ({subquery}) As i " \
|
||||
"WHERE " \
|
||||
"%(attr_where_i)s " \
|
||||
"ORDER BY i.\"{id_col}\" ASC;" % replacements
|
||||
|
||||
return query.format(**params)
|
||||
|
||||
## to add more weight methods open a ticket or pull request
|
||||
|
||||
def get_attributes(query_res, attr_num=1):
|
||||
"""
|
||||
@param query_res: query results with attributes and neighbors
|
||||
@param attr_num: attribute number (1, 2, ...)
|
||||
"""
|
||||
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
|
||||
|
||||
def empty_zipped_array(num_nones):
|
||||
"""
|
||||
prepare return values for cases of empty weights objects (no neighbors)
|
||||
Input:
|
||||
@param num_nones int: number of columns (e.g., 4)
|
||||
Output:
|
||||
[(None, None, None, None)]
|
||||
"""
|
||||
|
||||
return [tuple([None] * num_nones)]
|
||||
10
src/py/crankshaft/crankshaft/random_seeds.py
Normal file
10
src/py/crankshaft/crankshaft/random_seeds.py
Normal file
@@ -0,0 +1,10 @@
|
||||
import random
|
||||
import numpy
|
||||
|
||||
def set_random_seeds(value):
|
||||
"""
|
||||
Set the seeds of the RNGs (Random Number Generators)
|
||||
used internally.
|
||||
"""
|
||||
random.seed(value)
|
||||
numpy.random.seed(value)
|
||||
1
src/py/crankshaft/crankshaft/similarity/__init__.py
Normal file
1
src/py/crankshaft/crankshaft/similarity/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
from similarity import *
|
||||
91
src/py/crankshaft/crankshaft/similarity/similarity.py
Normal file
91
src/py/crankshaft/crankshaft/similarity/similarity.py
Normal file
@@ -0,0 +1,91 @@
|
||||
from sklearn.neighbors import NearestNeighbors
|
||||
import scipy.stats as stats
|
||||
import numpy as np
|
||||
import plpy
|
||||
import time
|
||||
import cPickle
|
||||
|
||||
|
||||
def query_to_dictionary(result):
|
||||
return [ dict(zip(r.keys(), r.values())) for r in result ]
|
||||
|
||||
def drop_all_nan_columns(data):
|
||||
return data[ :, ~np.isnan(data).all(axis=0)]
|
||||
|
||||
def fill_missing_na(data,val=None):
|
||||
inds = np.where(np.isnan(data))
|
||||
if val==None:
|
||||
col_mean = stats.nanmean(data,axis=0)
|
||||
data[inds]=np.take(col_mean,inds[1])
|
||||
else:
|
||||
data[inds]=np.take(val, inds[1])
|
||||
return data
|
||||
|
||||
def similarity_rank(target_cartodb_id, query):
|
||||
start_time = time.time()
|
||||
#plpy.notice('converting to dictionary ', start_time)
|
||||
#data = query_to_dictionary(plpy.execute(query))
|
||||
plpy.notice('coverted , running query ', time.time() - start_time)
|
||||
|
||||
data = plpy.execute(query_only_values(query))
|
||||
plpy.notice('run query , getting cartodb_idsi', time.time() - start_time)
|
||||
cartodb_ids = plpy.execute(query_cartodb_id(query))[0]['a']
|
||||
target_id = cartodb_ids.index(target_cartodb_id)
|
||||
plpy.notice('run query , extracting ', time.time() - start_time)
|
||||
features, target = extract_features_target(data,target_id)
|
||||
plpy.notice('extracted , cleaning ', time.time() - start_time)
|
||||
features = fill_missing_na(drop_all_nan_columns(features))
|
||||
plpy.notice('cleaned , normalizing', start_time - time.time())
|
||||
|
||||
normed_features, normed_target = normalize_features(features,target)
|
||||
plpy.notice('normalized , training ', time.time() - start_time )
|
||||
tree = train(normed_features)
|
||||
plpy.notice('normalized , pickling ', time.time() - start_time )
|
||||
#plpy.notice('tree_dump ', len(cPickle.dumps(tree, protocol=cPickle.HIGHEST_PROTOCOL)))
|
||||
plpy.notice('pickles, querying ', time.time() - start_time)
|
||||
dist, ind = tree.kneighbors(normed_target)
|
||||
plpy.notice('queried , rectifying', time.time() - start_time)
|
||||
return zip(cartodb_ids, dist[0])
|
||||
|
||||
def query_cartodb_id(query):
|
||||
return 'select array_agg(cartodb_id) a from ({0}) b'.format(query)
|
||||
|
||||
def query_only_values(query):
|
||||
first_row = plpy.execute('select * from ({query}) a limit 1'.format(query=query))
|
||||
just_values = ','.join([ key for key in first_row[0].keys() if key not in ['the_geom', 'the_geom_webmercator','cartodb_id']])
|
||||
return 'select Array[{0}] a from ({1}) b '.format(just_values, query)
|
||||
|
||||
|
||||
def most_similar(matches,query):
|
||||
data = plpy.execute(query)
|
||||
features, _ = extract_features_target(data)
|
||||
results = []
|
||||
for i in features:
|
||||
target = features
|
||||
dist,ind = tree.query(target, k=matches)
|
||||
cartodb_ids = [ dist[ind]['cartodb_id'] for index in ind ]
|
||||
results.append(cartodb_ids)
|
||||
return cartodb_ids, results
|
||||
|
||||
|
||||
def train(features):
|
||||
tree = NearestNeighbors( n_neighbors=len(features), algorithm='auto').fit(features)
|
||||
return tree
|
||||
|
||||
def normalize_features(features, target):
|
||||
maxes = features.max(axis=0)
|
||||
mins = features.min(axis=0)
|
||||
return (features - mins)/(maxes-mins), (target-mins)/(maxes-mins)
|
||||
|
||||
def extract_row(row):
|
||||
keys = row.keys()
|
||||
values = row.values()
|
||||
del values[ keys.index('cartodb_id')]
|
||||
return values
|
||||
|
||||
def extract_features_target(data, target_index=None):
|
||||
target = None
|
||||
features = [row['a'] for row in data]
|
||||
target = features[target_index]
|
||||
return np.array(features, dtype=float), np.array(target, dtype=float)
|
||||
|
||||
48
src/py/crankshaft/setup.py
Normal file
48
src/py/crankshaft/setup.py
Normal file
@@ -0,0 +1,48 @@
|
||||
|
||||
"""
|
||||
CartoDB Spatial Analysis Python Library
|
||||
See:
|
||||
https://github.com/CartoDB/crankshaft
|
||||
"""
|
||||
|
||||
from setuptools import setup, find_packages
|
||||
|
||||
setup(
|
||||
name='crankshaft',
|
||||
|
||||
version='0.0.0',
|
||||
|
||||
description='CartoDB Spatial Analysis Python Library',
|
||||
|
||||
url='https://github.com/CartoDB/crankshaft',
|
||||
|
||||
author='Data Services Team - CartoDB',
|
||||
author_email='dataservices@cartodb.com',
|
||||
|
||||
license='MIT',
|
||||
|
||||
classifiers=[
|
||||
'Development Status :: 3 - Alpha',
|
||||
'Intended Audience :: Mapping comunity',
|
||||
'Topic :: Maps :: Mapping Tools',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Programming Language :: Python :: 2.7',
|
||||
],
|
||||
|
||||
keywords='maps mapping tools spatial analysis geostatistics',
|
||||
|
||||
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
|
||||
|
||||
extras_require={
|
||||
'dev': ['unittest'],
|
||||
'test': ['unittest', 'nose', 'mock'],
|
||||
},
|
||||
|
||||
# The choice of component versions is dictated by what's
|
||||
# provisioned in the production servers.
|
||||
install_requires=['pysal==1.9.1', 'scikit-learn==0.17.1'],
|
||||
|
||||
requires=['pysal', 'numpy','sklearn'],
|
||||
|
||||
test_suite='test'
|
||||
)
|
||||
52
src/py/crankshaft/test/fixtures/moran.json
vendored
Normal file
52
src/py/crankshaft/test/fixtures/moran.json
vendored
Normal file
@@ -0,0 +1,52 @@
|
||||
[[0.9319096128346788, "HH"],
|
||||
[-1.135787401862846, "HL"],
|
||||
[0.11732030672508517, "LL"],
|
||||
[0.6152779669180425, "LL"],
|
||||
[-0.14657336660125297, "LH"],
|
||||
[0.6967858120189607, "LL"],
|
||||
[0.07949310115714454, "HH"],
|
||||
[0.4703198759258987, "HH"],
|
||||
[0.4421125200498064, "HH"],
|
||||
[0.5724288737143592, "LL"],
|
||||
[0.8970743435692062, "LL"],
|
||||
[0.18327334401918674, "LL"],
|
||||
[-0.01466729201304962, "HL"],
|
||||
[0.3481559372544409, "LL"],
|
||||
[0.06547094736902978, "LL"],
|
||||
[0.15482141569329988, "HH"],
|
||||
[0.4373841193538136, "HH"],
|
||||
[0.15971286468915544, "LL"],
|
||||
[1.0543588860308968, "HH"],
|
||||
[1.7372866900020818, "HH"],
|
||||
[1.091998586053999, "LL"],
|
||||
[0.1171572584252222, "HH"],
|
||||
[0.08438455015300014, "LL"],
|
||||
[0.06547094736902978, "LL"],
|
||||
[0.15482141569329985, "HH"],
|
||||
[1.1627044812890683, "HH"],
|
||||
[0.06547094736902978, "LL"],
|
||||
[0.795275137550483, "HH"],
|
||||
[0.18562939195219, "LL"],
|
||||
[0.3010757406693439, "LL"],
|
||||
[2.8205795942839376, "HH"],
|
||||
[0.11259190602909264, "LL"],
|
||||
[-0.07116352791516614, "HL"],
|
||||
[-0.09945240794119009, "LH"],
|
||||
[0.18562939195219, "LL"],
|
||||
[0.1832733440191868, "LL"],
|
||||
[-0.39054253768447705, "HL"],
|
||||
[-0.1672071289487642, "HL"],
|
||||
[0.3337669247916343, "HH"],
|
||||
[0.2584386102554792, "HH"],
|
||||
[-0.19733845476322634, "HL"],
|
||||
[-0.9379282899805409, "LH"],
|
||||
[-0.028770969951095866, "LH"],
|
||||
[0.051367269430983485, "LL"],
|
||||
[-0.2172548045913472, "LH"],
|
||||
[0.05136726943098351, "LL"],
|
||||
[0.04191046803899837, "LL"],
|
||||
[0.7482357030403517, "HH"],
|
||||
[-0.014585767863118111, "LH"],
|
||||
[0.5410013139159929, "HH"],
|
||||
[1.0223932668429925, "LL"],
|
||||
[1.4179402898927476, "LL"]]
|
||||
54
src/py/crankshaft/test/fixtures/neighbors.json
vendored
Normal file
54
src/py/crankshaft/test/fixtures/neighbors.json
vendored
Normal file
@@ -0,0 +1,54 @@
|
||||
[
|
||||
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
|
||||
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
|
||||
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
|
||||
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
|
||||
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
|
||||
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
|
||||
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
|
||||
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
|
||||
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
|
||||
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
|
||||
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
|
||||
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
|
||||
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
|
||||
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
|
||||
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
|
||||
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
|
||||
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
|
||||
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
|
||||
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
|
||||
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
|
||||
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
|
||||
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
|
||||
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
|
||||
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
|
||||
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
|
||||
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
|
||||
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
|
||||
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
|
||||
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
|
||||
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
|
||||
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
|
||||
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
|
||||
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
|
||||
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
|
||||
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
|
||||
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
|
||||
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
|
||||
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
|
||||
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
|
||||
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
|
||||
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
|
||||
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
|
||||
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
|
||||
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
|
||||
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
|
||||
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
|
||||
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
|
||||
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
|
||||
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
|
||||
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
|
||||
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
|
||||
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
|
||||
]
|
||||
13
src/py/crankshaft/test/helper.py
Normal file
13
src/py/crankshaft/test/helper.py
Normal file
@@ -0,0 +1,13 @@
|
||||
import unittest
|
||||
|
||||
from mock_plpy import MockPlPy
|
||||
plpy = MockPlPy()
|
||||
|
||||
import sys
|
||||
sys.modules['plpy'] = plpy
|
||||
|
||||
import os
|
||||
|
||||
def fixture_file(name):
|
||||
dir = os.path.dirname(os.path.realpath(__file__))
|
||||
return os.path.join(dir, 'fixtures', name)
|
||||
34
src/py/crankshaft/test/mock_plpy.py
Normal file
34
src/py/crankshaft/test/mock_plpy.py
Normal file
@@ -0,0 +1,34 @@
|
||||
import re
|
||||
|
||||
class MockPlPy:
|
||||
def __init__(self):
|
||||
self._reset()
|
||||
|
||||
def _reset(self):
|
||||
self.infos = []
|
||||
self.notices = []
|
||||
self.debugs = []
|
||||
self.logs = []
|
||||
self.warnings = []
|
||||
self.errors = []
|
||||
self.fatals = []
|
||||
self.executes = []
|
||||
self.results = []
|
||||
self.prepares = []
|
||||
self.results = []
|
||||
|
||||
def _define_result(self, query, result):
|
||||
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
|
||||
self.results.append([pattern, result])
|
||||
|
||||
def notice(self, msg):
|
||||
self.notices.append(msg)
|
||||
|
||||
def info(self, msg):
|
||||
self.infos.append(msg)
|
||||
|
||||
def execute(self, query): # TODO: additional arguments
|
||||
for result in self.results:
|
||||
if result[0].match(query):
|
||||
return result[1]
|
||||
return []
|
||||
83
src/py/crankshaft/test/test_clustering_moran.py
Normal file
83
src/py/crankshaft/test/test_clustering_moran.py
Normal file
@@ -0,0 +1,83 @@
|
||||
import unittest
|
||||
import numpy as np
|
||||
|
||||
|
||||
# from mock_plpy import MockPlPy
|
||||
# plpy = MockPlPy()
|
||||
#
|
||||
# import sys
|
||||
# sys.modules['plpy'] = plpy
|
||||
from helper import plpy, fixture_file
|
||||
|
||||
import crankshaft.clustering as cc
|
||||
import crankshaft.pysal_utils as pu
|
||||
from crankshaft import random_seeds
|
||||
import json
|
||||
|
||||
class MoranTest(unittest.TestCase):
|
||||
"""Testing class for Moran's I functions"""
|
||||
|
||||
def setUp(self):
|
||||
plpy._reset()
|
||||
self.params = {"id_col": "cartodb_id",
|
||||
"attr1": "andy",
|
||||
"attr2": "jay_z",
|
||||
"subquery": "SELECT * FROM a_list",
|
||||
"geom_col": "the_geom",
|
||||
"num_ngbrs": 321}
|
||||
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
|
||||
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
|
||||
|
||||
def test_map_quads(self):
|
||||
"""Test map_quads"""
|
||||
self.assertEqual(cc.map_quads(1), 'HH')
|
||||
self.assertEqual(cc.map_quads(2), 'LH')
|
||||
self.assertEqual(cc.map_quads(3), 'LL')
|
||||
self.assertEqual(cc.map_quads(4), 'HL')
|
||||
self.assertEqual(cc.map_quads(33), None)
|
||||
self.assertEqual(cc.map_quads('andy'), None)
|
||||
|
||||
def test_quad_position(self):
|
||||
"""Test lisa_sig_vals"""
|
||||
|
||||
quads = np.array([1, 2, 3, 4], np.int)
|
||||
|
||||
ans = np.array(['HH', 'LH', 'LL', 'HL'])
|
||||
test_ans = cc.quad_position(quads)
|
||||
|
||||
self.assertTrue((test_ans == ans).all())
|
||||
|
||||
def test_moran_local(self):
|
||||
"""Test Moran's I local"""
|
||||
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
|
||||
plpy._define_result('select', data)
|
||||
random_seeds.set_random_seeds(1234)
|
||||
result = cc.moran_local('subquery', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
|
||||
result = [(row[0], row[1]) for row in result]
|
||||
expected = self.moran_data
|
||||
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
|
||||
self.assertAlmostEqual(res_val, exp_val)
|
||||
self.assertEqual(res_quad, exp_quad)
|
||||
|
||||
def test_moran_local_rate(self):
|
||||
"""Test Moran's I rate"""
|
||||
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
|
||||
plpy._define_result('select', data)
|
||||
random_seeds.set_random_seeds(1234)
|
||||
result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 99, 'the_geom', 'cartodb_id', 'knn', 5)
|
||||
print 'result == None? ', result == None
|
||||
result = [(row[0], row[1]) for row in result]
|
||||
expected = self.moran_data
|
||||
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
|
||||
self.assertAlmostEqual(res_val, exp_val)
|
||||
|
||||
def test_moran(self):
|
||||
"""Test Moran's I global"""
|
||||
data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
|
||||
plpy._define_result('select', data)
|
||||
random_seeds.set_random_seeds(1235)
|
||||
result = cc.moran('table', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
|
||||
print 'result == None?', result == None
|
||||
result_moran = result[0][0]
|
||||
expected_moran = np.array([row[0] for row in self.moran_data]).mean()
|
||||
self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)
|
||||
107
src/py/crankshaft/test/test_pysal_utils.py
Normal file
107
src/py/crankshaft/test/test_pysal_utils.py
Normal file
@@ -0,0 +1,107 @@
|
||||
import unittest
|
||||
|
||||
import crankshaft.pysal_utils as pu
|
||||
from crankshaft import random_seeds
|
||||
|
||||
|
||||
class PysalUtilsTest(unittest.TestCase):
|
||||
"""Testing class for utility functions related to PySAL integrations"""
|
||||
|
||||
def setUp(self):
|
||||
self.params = {"id_col": "cartodb_id",
|
||||
"attr1": "andy",
|
||||
"attr2": "jay_z",
|
||||
"subquery": "SELECT * FROM a_list",
|
||||
"geom_col": "the_geom",
|
||||
"num_ngbrs": 321}
|
||||
|
||||
def test_query_attr_select(self):
|
||||
"""Test query_attr_select"""
|
||||
|
||||
ans = "i.\"{attr1}\"::numeric As attr1, " \
|
||||
"i.\"{attr2}\"::numeric As attr2, "
|
||||
|
||||
self.assertEqual(pu.query_attr_select(self.params), ans)
|
||||
|
||||
def test_query_attr_where(self):
|
||||
"""Test pu.query_attr_where"""
|
||||
|
||||
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND " \
|
||||
"idx_replace.\"{attr2}\" IS NOT NULL AND " \
|
||||
"idx_replace.\"{attr2}\" <> 0"
|
||||
|
||||
self.assertEqual(pu.query_attr_where(self.params), ans)
|
||||
|
||||
def test_knn(self):
|
||||
"""Test knn neighbors constructor"""
|
||||
|
||||
ans = "SELECT i.\"cartodb_id\" As id, " \
|
||||
"i.\"andy\"::numeric As attr1, " \
|
||||
"i.\"jay_z\"::numeric As attr2, " \
|
||||
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
|
||||
"FROM (SELECT * FROM a_list) As j " \
|
||||
"WHERE " \
|
||||
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
|
||||
"j.\"andy\" IS NOT NULL AND " \
|
||||
"j.\"jay_z\" IS NOT NULL AND " \
|
||||
"j.\"jay_z\" <> 0 " \
|
||||
"ORDER BY " \
|
||||
"j.\"the_geom\" <-> i.\"the_geom\" ASC " \
|
||||
"LIMIT 321)) As neighbors " \
|
||||
"FROM (SELECT * FROM a_list) As i " \
|
||||
"WHERE i.\"andy\" IS NOT NULL AND " \
|
||||
"i.\"jay_z\" IS NOT NULL AND " \
|
||||
"i.\"jay_z\" <> 0 " \
|
||||
"ORDER BY i.\"cartodb_id\" ASC;"
|
||||
|
||||
self.assertEqual(pu.knn(self.params), ans)
|
||||
|
||||
def test_queen(self):
|
||||
"""Test queen neighbors constructor"""
|
||||
|
||||
ans = "SELECT i.\"cartodb_id\" As id, " \
|
||||
"i.\"andy\"::numeric As attr1, " \
|
||||
"i.\"jay_z\"::numeric As attr2, " \
|
||||
"(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
|
||||
"FROM (SELECT * FROM a_list) As j " \
|
||||
"WHERE " \
|
||||
"i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
|
||||
"ST_Touches(i.\"the_geom\", " \
|
||||
"j.\"the_geom\") AND " \
|
||||
"j.\"andy\" IS NOT NULL AND " \
|
||||
"j.\"jay_z\" IS NOT NULL AND " \
|
||||
"j.\"jay_z\" <> 0)" \
|
||||
") As neighbors " \
|
||||
"FROM (SELECT * FROM a_list) As i " \
|
||||
"WHERE i.\"andy\" IS NOT NULL AND " \
|
||||
"i.\"jay_z\" IS NOT NULL AND " \
|
||||
"i.\"jay_z\" <> 0 " \
|
||||
"ORDER BY i.\"cartodb_id\" ASC;"
|
||||
|
||||
self.assertEqual(pu.queen(self.params), ans)
|
||||
|
||||
def test_construct_neighbor_query(self):
|
||||
"""Test construct_neighbor_query"""
|
||||
|
||||
# Compare to raw knn query
|
||||
self.assertEqual(pu.construct_neighbor_query('knn', self.params),
|
||||
pu.knn(self.params))
|
||||
|
||||
def test_get_attributes(self):
|
||||
"""Test get_attributes"""
|
||||
|
||||
## need to add tests
|
||||
|
||||
self.assertEqual(True, True)
|
||||
|
||||
def test_get_weight(self):
|
||||
"""Test get_weight"""
|
||||
|
||||
self.assertEqual(True, True)
|
||||
|
||||
def test_empty_zipped_array(self):
|
||||
"""Test empty_zipped_array"""
|
||||
ans2 = [(None, None)]
|
||||
ans4 = [(None, None, None, None)]
|
||||
self.assertEqual(pu.empty_zipped_array(2), ans2)
|
||||
self.assertEqual(pu.empty_zipped_array(4), ans4)
|
||||
Reference in New Issue
Block a user