Compare commits

...

25 Commits

Author SHA1 Message Date
Javier Goizueta
5a7d3178dd Release 0.0.2
This version is the first with the new versioning approach
which uses separate per-version Pyhton virtual enironments.
2016-03-16 19:22:21 +01:00
Javier Goizueta
4903af6cdc Add existing release 0.0.1
The existing 0.0.1 files are placed into their location in release/
2016-03-16 18:41:49 +01:00
Javier Goizueta
692014d694 Merge pull request #11 from CartoDB/new-versioning-package-varenv
New versioning process (with multiple virtual environments)
2016-03-16 18:21:52 +01:00
Javier Goizueta
47e0253652 Fixes to the documentation 2016-03-16 18:18:59 +01:00
Javier Goizueta
9f03a9b075 Reorganize the documentation into separate files
Keep a "Quickstart Guide" in the README, add separate
detailed sections for development (CONTRIBUTING) and
release/deployment (RELEASE).
2016-03-16 17:42:28 +01:00
Javier Goizueta
b5281d0681 Documentation clarifications and corrections. 2016-03-16 17:19:21 +01:00
Javier Goizueta
689ec8a925 Change version function from IMMUTABLE to STABLE
These functions' results will change when the extension
is updated.
2016-03-16 17:09:50 +01:00
Javier Goizueta
a7e42e93cc Rename cdb_crankshaft_internal_version as internal function 2016-03-16 16:41:54 +01:00
Javier Goizueta
bad09ffd7b Remove abandoned alternatives from the documentation 2016-03-16 16:30:03 +01:00
Javier Goizueta
4706442a1d Add documentation about useful make targets 2016-03-16 15:56:19 +01:00
Javier Goizueta
935c7f9963 Add missing Makefile comment 2016-03-16 15:54:39 +01:00
Javier Goizueta
ef3bcaeee8 Restore commented-out make target 2016-03-16 15:52:47 +01:00
Javier Goizueta
4ffb2c9664 Review and fix the documentation 2016-03-16 15:45:13 +01:00
Javier Goizueta
dea6e2f1a7 Refactor the Makefile
Separate concerns properly for each subdirectory's Makefile
2016-03-16 15:40:40 +01:00
Javier Goizueta
d13f167d47 Add RELEASE_VERSION option to make deploy
Now make deploy installs by default the current version,
but can be made to install any prior specific version using
a environmnt varialbe RELEASE_VERSION
2016-03-16 14:38:18 +01:00
Javier Goizueta
a518034e65 Fix .pyc files need not only be ignored inside src/py 2016-03-16 11:13:26 +01:00
Javier Goizueta
24e4037995 Fix version number of released extension script 2016-03-16 11:11:16 +01:00
Javier Goizueta
82a738fe40 Fix make clean tasks 2016-03-16 10:18:07 +01:00
Javier Goizueta
e801c9cb60 Release tasks using release-specific virtual environments
Refine the development process and define the procedure for
releasing new versions.
2016-03-15 18:48:46 +01:00
Javier Goizueta
0206cc6c44 Update documentation 2016-03-10 19:13:46 +01:00
Rafa de la Torre
b754ffe42a Add info about python dependencies 2016-03-10 18:06:21 +01:00
Javier Goizueta
0056f411b5 Set the path to virtualenvs in the Makefile
Also, version the virtualenv
2016-03-09 19:04:21 +01:00
Javier Goizueta
1810f02242 Use SciPy from system package python-scipy 2016-03-09 15:03:17 +01:00
Javier Goizueta
8e972128eb Modify sql code to user the python virtualenv 2016-03-09 15:00:50 +01:00
Javier Goizueta
cdd2d9e722 Directory reorganization and sketch of new versioning procedure 2016-03-08 19:35:02 +01:00
79 changed files with 2190 additions and 200 deletions

View File

@@ -1 +1,2 @@
envs/
*.pyc

View File

@@ -1,84 +1,91 @@
# Contributing guide
# Development process
## How to add new functions
Please read the Working Process/Quickstart Guide in README.md first.
Try to put as little logic in the SQL extension as possible and
just use it as a wrapper to the Python module functionality.
For any modification of crankshaft, such as adding new features,
refactoring or bug-fixing, topic branch must be created out of the `develop`
branch and be used for the development process.
Once a function is defined it should never change its signature in subsequent
versions. To change a function's signature a new function with a different
name must be created.
Modifications are done inside `src/pg/sql` and `src/py/crankshaft`.
### Version numbers
Take into account:
The version of both the SQL extension and the Python package shall
follow the [Semantic Versioning 2.0](http://semver.org/) guidelines:
* Tests must be added for any new functionality
(inside `src/pg/test`, `src/py/crankshaft/test`) as well as to
detect any bugs that are being fixed.
* Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
* Convention: snake case(i.e. `snake_case` and not `CamelCase`)
shall be used for all function names.
Prefix function names intended for public use with `cdb_`
and private functions (to be used only internally inside
the extension) with `_cdb_`.
* When backwards incompatibility is introduced the major number is incremented
* When functionally is added (in a backwards-compatible manner) the minor number
is incremented
* When only fixes are introduced (backwards-compatible) the patch number is
incremented
Once the code is ready to be tested, update the local development installation
with `sudo make install`.
This will update the 'dev' version of the extension in `src/pg/` and
make it available to PostgreSQL.
It will also install the python package (crankshaft) in a virtual
environment `env/dev`.
### Python Package
The version number of the Python package, defined in
`src/pg/crankshaft/setup.py` will be overridden when
the package is released and always match the extension version number,
but for development it shall be kept as '0.0.0'.
...
Run the tests with `make test`.
### SQL Extension
* Generate a **new subfolder version** for `sql` and `test` folders to define
the new functions and tests
- Use symlinks to avoid file duplication between versions that don't update them
- Add new files or modify copies of the old files to add new functions or
modify existing functions (remember to rename a function if the signature
changes)
- Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
- Create tests for the new functions/behaviour
* Generate the **upgrade and downgrade files** for the extension
* Update the control file and the Makefile to generate the complete SQL
file for the new created version. After running `make` a new
file `crankshaft--X.Y.Z.sql` will be created for the current version.
Additional files for migrating to/from the previous version A.B.Z should be
created:
- `crankshaft--X.Y.Z--A.B.C.sql`
- `crankshaft--A.B.C--X.Y.Z.sql`
All these new files must be added to git and pushed.
* Update the public docs! ;-)
## Conventions
# SQL
Use snake case (i.e. `snake_case` and not `CamelCase`) for all
functions. Prefix functions intended for public use with `cdb_`
and private functions (to be used only internally inside
the extension) with `_cdb_`.
# Python
...
## Testing
Running just the Python tests:
To use the python extension for custom tests, activate the virtual
environment with:
```
(cd python && make test)
source envs/dev/bin/activate
```
Installing the Extension and running just the PostgreSQL tests:
Update extension in a working database with:
* `ALTER EXTENSION crankshaft VERSION TO 'current';`
`ALTER EXTENSION crankshaft VERSION TO 'dev';`
Note: we keep the current development version install as 'dev' always;
we update through the 'current' alias to allow changing the extension
contents but not the version identifier. This will fail if the
changes involve incompatible function changes such as a different
return type; in that case the offending function (or the whole extension)
should be dropped manually before the update.
If the extension has not previously been installed in a database,
it can be installed directly with:
* `CREATE EXTENSION crankshaft WITH VERSION 'dev';`
Note: the development extension uses the development python virtual
environment automatically.
Before proceeding to the release process peer code reviewing of the code is
a must.
Once the feature or bugfix is completed and all the tests are passing
a Pull-Request shall be created on the topic branch, reviewed by a peer
and then merged back into the `develop` branch when all CI tests pass.
When the changes in the `develop` branch are to be released in a new
version of the extension, a PR must be created on the `develop` branch.
The release manage will take hold of the PR at this moment to proceed
to the release process for a new revision of the extension.
## Relevant development tasks available in the Makefile
```
(cd pg && sudo make install && PGUSER=postgres make installcheck)
```
* `make help` show a short description of the available targets
Installing and testing everything:
* `sudo make install` will generate the extension scripts for the development
version ('dev'/'current') and install the python package into the
development virtual environment `envs/dev`.
Intended for use by developers.
```
sudo make install && PGUSER=postgres make testinstalled
* `make test` will run the tests for the installed development extension.
Intended for use by developers.
```

View File

@@ -1,43 +0,0 @@
# Workflow
... (branching/merging flow)
# Deployment
...
Deployment to db servers: the next command will install both the Python
package and the extension.
```
sudo make install
```
Installing only the Python package:
```
sudo pip install python/crankshaft --upgrade
```
Caveat: note that `pip install ./crankshaft` will install
from local files, but `pip install crankshaft` will not.
CI: Install and run the tests on the installed extension and package:
```
(sudo make install && PGUSER=postgres make testinstalled)
```
Installing the extension in user databases:
Once installed in a server, the extension can be added
to a database with the next SQL command:
```
CREATE EXTENSION crankshaft;
```
To upgrade the extension to an specific version X.Y.Z:
```
ALTER EXTENSION crankshaft UPGRADE TO 'X.Y.Z';
```

View File

@@ -1,13 +1,70 @@
EXT_DIR = pg
PYP_DIR = python
include ./Makefile.global
EXT_DIR = src/pg
PYP_DIR = src/py
.PHONY: install
.PHONY: run_tests
.PHONY: release
.PHONY: deploy
install:
# Generate and install developmet versions of the extension
# and python package.
# The extension is named 'dev' with a 'current' alias for easily upgrading.
# The Python package is installed in a virtual environment envs/dev/
# Requires sudo.
install: ## Generate and install development version of the extension; requires sudo.
$(MAKE) -C $(PYP_DIR) install
$(MAKE) -C $(EXT_DIR) install
testinstalled:
$(MAKE) -C $(PYP_DIR) testinstalled
$(MAKE) -C $(EXT_DIR) installcheck
# Run the tests for the installed development extension and
# python package
test: ## Run the tests for the development version of the extension
$(MAKE) -C $(PYP_DIR) test
$(MAKE) -C $(EXT_DIR) test
# Generate a new release into release
release: ## Generate a new release of the extension. Only for telease manager
$(MAKE) -C $(EXT_DIR) release
$(MAKE) -C $(PYP_DIR) release
# Install the current release.
# The Python package is installed in a virtual environment envs/X.Y.Z/
# Requires sudo.
# Use the RELEASE_VERSION environment variable to deploy a specific version:
# sudo make deploy RELEASE_VERSION=1.0.0
deploy: ## Deploy a released extension. Only for release manager. Requires sudo.
$(MAKE) -C $(EXT_DIR) deploy
$(MAKE) -C $(PYP_DIR) deploy
# Cleanup development extension script files
clean-dev: ## clean up development extension script files
rm -f src/pg/$(EXTENSION)--*.sql
# Cleanup all releases
clean-releases: ## clean up all releases
rm -rf release/python/*
rm -f release/$(EXTENSION)--*.sql
rm -f release/$(EXTENSION).control
# Cleanup current/specific version
clean-release: ## clean up current release
rm -rf release/python/$(RELEASE_VERSION)
rm -f release/$(RELEASE_VERSION)--*.sql
# Cleanup all virtual environments
clean-environments: ## clean up all virtual environments
rm -rf envs/*
clean-all: clean-dev clean-release clean-environments
help:
@IFS=$$'\n' ; \
help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \
for help_line in $${help_lines[@]}; do \
IFS=$$'#' ; \
help_split=($$help_line) ; \
help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
printf "%-30s %s\n" $$help_command $$help_info ; \
done

6
Makefile.global Normal file
View File

@@ -0,0 +1,6 @@
SELF_DIR := $(dir $(lastword $(MAKEFILE_LIST)))
EXTENSION = crankshaft
PACKAGE = crankshaft
EXTVERSION = $(shell grep default_version $(SELF_DIR)/src/pg/$(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
RELEASE_VERSION ?= $(EXTVERSION)
SED = sed

7
NEWS.md Normal file
View File

@@ -0,0 +1,7 @@
0.0.2 (2016-03-16)
------------------
* New versioning approach using per-version Python virtual environments
0.0.1 (2016-02-22)
------------------
* Preliminar release

View File

@@ -4,9 +4,68 @@ CartoDB Spatial Analysis extension for PostgreSQL.
## Code organization
* *pg* contains the PostgreSQL extension source code
* *python* Python module
* *doc* documentation
* *src* source code
* - *src/pg* contains the PostgreSQL extension source code
* - *src/py* Python module source code
* *release* reseleased versions
* *env* base directory for Python virtual environments
## Requirements
* pip
* pip, virtualenv, PostgreSQL
* python-scipy system package (see src/py/README.md)
# Working Process -- Quickstart Guide
We distinguish two roles regarding the development cycle of crankshaft:
* *developers* will implement new functionality and bugfixes into
the codebase and will request for new releases of the extension.
* A *release manager* will attend these requests and will handle
the release process. The release process is sequential:
no concurrent releases will ever be in the works.
We use the default `develop` branch as the basis for development.
The `master` branch is used to merge and tag releases to be
deployed in production.
Developers shall create a new topic branch from `develop` for any new feature
or bugfix and commit their changes to it and eventually merge back into
the `develop` branch. When a new release is required a Pull Request
will be open againt the `develop` branch.
The `develop` pull requests will be handled by the release manage,
who will merge into master where new releases are prepared and tagged.
The `master` branch is the sole responsibility of the release masters
and developers must not commit or merge into it.
## Development Guidelines
For a detailed description of the development process please see
the CONTRIBUTING.md guide.
Any modification to the source code (`src/pg/sql` for the SQL extension,
`src/py/crankshaft` for the Python package) shall always be done
in a topic branch created from the `develop` branch.
Tests, documentation and peer code reviewing are required for all
modifications.
The tests (both for SQL and Pyhton) are executed by running,
from the top directory:
```
sudo make install
make test
```
To request a new release, which will be handled by them
release manager, a Pull Request must be created in the `develop`
branch.
## Release
The release and deployment process is described in the
RELEASE.md guide and it is the responsibility of the designated
release manager.

93
RELEASE.md Normal file
View File

@@ -0,0 +1,93 @@
# Release & Deployment Process
Please read the Working Process/Quickstart Guide in README.md
and the Development guidelines in CONTRIBUTING.md.
The release process of a new version of the extension
shall be performed by the designated *Release Manager*.
Note that we expect to gradually automate more of this process.
Having checked PR to be released it shall be
merged back into the `master` branch to prepare the new release.
The version number in `pg/cranckshaft.control` must first be updated.
To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
Thew `NEWS.md` will be updated.
We now will explain the process for the case of backwards-compatible
releases (updating the minor or patch version numbers).
TODO: document the complex case of major releases.
The next command must be executed to produce the main installation
script for the new release, `release/cranckshaft--X.Y.Z.sql` and
also to copy the python package to `release/python/X.Y.Z/crankshaft`.
```
make release
```
Then, the release manager shall produce upgrade and downgrade scripts
to migrate to/from the previous release. In the case of minor/patch
releases this simply consist in extracting the functions that have changed
and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
file.
The new release can be deployed for staging/smoke tests with this command:
```
sudo make deploy
```
This will copy the current 'X.Y.Z' released version of the extension to
PostgreSQL. The corresponding Python extension will be installed in a
virtual environment in `envs/X.Y.Z`.
It can be activated with:
```
source envs/X.Y.Z/bin/activate
```
But note that this is needed only for using the package directly;
the 'X.Y.Z' version of the extension will automatically use the
python package from this virtual environment.
The `sudo make deploy` operation can be also used for installing
the new version after it has been released.
To install a specific version 'X.Y.Z' different from the current one
(which must be present in `releases/`) you can:
```
sudo make deploy RELEASE_VERSION=X.Y.Z
```
TODO: testing procedure for the new release.
TODO: procedure for staging deployment.
TODO: procedure for merging to master, tagging and deploying
in production.
## Relevant release & deployment tasks available in the Makefile
```
* `make help` show a short description of the available targets
* `make release` will generate a new release (version number defined in
`src/pg/crankshaft.control`) into `release/`.
Intended for use by the release manager.
* `sudo make deploy` will install the current release X.Y.Z from the
`release/` files into PostgreSQL and a Python virtual environment
`envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
* `sudo make deploy RELEASE_VERSION=X.Y.Z` will install specified version
previously generated in `release/`
into PostgreSQL and a Python virtual environment `envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
```

View File

@@ -1,9 +0,0 @@
* [x] Support versioning
* [x] Test use of `plpy` from python Package
* [x] Add `pysal` etc. dependencies
* [x] Define documentation practices (general, per extension/package?)
* [x] Add initial function set (WIP)
* Unify style of function comments
* [x] Add integration tests
* Make target to open a new version development (create symlinks, etc.)
* [x] Should add cartodb ext. as a dependency?

3
pg/.gitignore vendored
View File

@@ -1,3 +0,0 @@
regression.diffs
regression.out
results/

View File

@@ -1,33 +0,0 @@
# Makefile to generate the extension out of separate sql source files.
# Once a version is released, it is not meant to be changed. E.g: once version 0.0.1 is out, it SHALL NOT be changed.
EXTENSION = crankshaft
EXTVERSION = $(shell grep default_version $(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
# The new version to be generated from templates
NEW_EXTENSION_ARTIFACT = $(EXTENSION)--$(EXTVERSION).sql
# DATA is a special variable used by postgres build infrastructure
# These are the files to be installed in the server shared dir,
# for installation from scratch, upgrades and downgrades.
# @see http://www.postgresql.org/docs/current/static/extend-pgxs.html
DATA = $(NEW_EXTENSION_ARTIFACT)
SOURCES_DATA_DIR = sql/$(EXTVERSION)
SOURCES_DATA = $(wildcard sql/$(EXTVERSION)/*.sql)
# The extension installation artifacts are stored in the base subdirectory
$(NEW_EXTENSION_ARTIFACT): $(SOURCES_DATA)
rm -f $@
cat $(SOURCES_DATA_DIR)/*.sql >> $@
REGRESS = $(notdir $(basename $(wildcard test/$(EXTVERSION)/sql/*test.sql)))
TEST_DIR = test/$(EXTVERSION)
REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
# This seems to be needed at least for PG 9.3.11
all: $(DATA)

View File

@@ -1,7 +0,0 @@
# Running the tests:
```
sudo make install
PGUSER=postgres make installcheck
```

View File

@@ -1,6 +0,0 @@
-- Install dependencies
CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis;
CREATE EXTENSION cartodb;
-- Install the extension
CREATE EXTENSION crankshaft;

View File

@@ -1,11 +0,0 @@
# Install the package (needs root privileges)
install:
pip install ./crankshaft --upgrade
# Test from source code
test:
(cd crankshaft && nosetests test/)
# Test currently installed package
testinstalled:
nosetests crankshaft/test/

View File

@@ -1,9 +0,0 @@
# Crankshaft Python Package
...
### Run the tests
```bash
cd crankshaft
nosetests test/
```

0
release/.gitignore vendored Normal file
View File

View File

@@ -0,0 +1,74 @@
CREATE OR REPLACE FUNCTION cdb_crankshaft.cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.2'::text;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
RETURN '/home/ubuntu/crankshaft/envs';
END;
$$ language plpgsql IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;

View File

@@ -0,0 +1,44 @@
CREATE OR REPLACE FUNCTION
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
DROP FUNCTION IF EXISTS cdb_crankshaft.cdb_crankshaft_version();
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_internal_version();
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_activate_py();

View File

@@ -0,0 +1,186 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.2'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
-- RETURN '/opt/virtualenvs/crankshaft';
RETURN '/home/ubuntu/crankshaft/envs';
END;
$$ language plpgsql IMMUTABLE STRICT;
-- Use the crankshaft python module
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate
CREATE OR REPLACE FUNCTION
cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.1'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
superuser = true
schema = cdb_crankshaft

0
release/python/.gitignore vendored Normal file
View File

View File

@@ -10,7 +10,7 @@ from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.1',
version='0.0.01',
description='CartoDB Spatial Analysis Python Library',

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1 @@
from moran import *

View File

@@ -0,0 +1,321 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import numpy as np
import pysal as ps
import plpy
# High level interface ---------------------------------------
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
# TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
y = get_attributes(r, 1)
w = get_weight(r, w_type)
# calculate LISA values
lisa = ps.Moran_Local(y, w)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I Local Rate
Andy Eschbacher
"""
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
plpy.notice('r.nrows() = %d' % r.nrows())
## collect attributes
numer = get_attributes(r, 1)
denom = get_attributes(r, 2)
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
## TODO: Decide on which return values here
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"table": t,
"geom_col": geom_column,
"id_col": id_col}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
## collect attributes
attr1_vals = get_attributes(r, 1)
attr2_vals = get_attributes(r, 2)
# create weights
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
:param coord (int): quadrant of a specific measurement
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
:param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
:param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE %(attr_where_j)s " \
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs} OFFSET 1 ) " \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
:param params: dict of information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_query(w_type, query_vals):
"""Return requested query.
:param w_type: type of neighbors to calculate (knn or queen)
:param query_vals: values used to construct the query
"""
if w_type == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
def get_attributes(query_res, attr_num):
"""
:param query_res: query results with attributes and neighbors
:param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
## Build weight object
def get_weight(query_res, w_type='queen', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
:param query_res: query results with attributes and neighbors
"""
if w_type == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
elif w_type == 'queen':
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def quad_position(quads):
"""
Produce Moran's I classification based of n
"""
lisa_sig = np.array([map_quads(q) for q in quads])
return lisa_sig
def lisa_sig_vals(pvals, quads, threshold):
"""
Produce Moran's I classification based of n
"""
sig = (pvals <= threshold)
lisa_sig = np.empty(len(sig), np.chararray)
for idx, val in enumerate(sig):
if val:
lisa_sig[idx] = map_quads(quads[idx])
else:
lisa_sig[idx] = 'Not significant'
return lisa_sig

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.2',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.9.1'],
requires=['pysal', 'numpy' ],
test_suite='test'
)

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "Not significant"],
[0.6152779669180425, "Not significant"],
[-0.14657336660125297, "Not significant"],
[0.6967858120189607, "Not significant"],
[0.07949310115714454, "Not significant"],
[0.4703198759258987, "Not significant"],
[0.4421125200498064, "Not significant"],
[0.5724288737143592, "Not significant"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "Not significant"],
[-0.01466729201304962, "Not significant"],
[0.3481559372544409, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "Not significant"],
[0.15971286468915544, "Not significant"],
[1.0543588860308968, "Not significant"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "Not significant"],
[0.08438455015300014, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "Not significant"],
[0.795275137550483, "Not significant"],
[0.18562939195219, "LL"],
[0.3010757406693439, "Not significant"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "Not significant"],
[-0.07116352791516614, "Not significant"],
[-0.09945240794119009, "Not significant"],
[0.18562939195219, "LL"],
[0.1832733440191868, "Not significant"],
[-0.39054253768447705, "Not significant"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "Not significant"],
[0.2584386102554792, "Not significant"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "Not significant"],
[0.051367269430983485, "Not significant"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "Not significant"],
[0.04191046803899837, "Not significant"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "Not significant"],
[0.5410013139159929, "Not significant"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,144 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"table": "a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads."""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_query_attr_select(self):
"""Test query_attr_select."""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(cc.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test query_attr_where."""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(cc.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
"BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
"i.\"cartodb_id\" ASC;"
self.assertEqual(cc.queen(self.params), ans)
def test_get_query(self):
"""Test get_query."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.get_query('knn', self.params), ans)
def test_get_attributes(self):
"""Test get_attributes."""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight."""
self.assertEqual(True, True)
def test_quad_position(self):
"""Test lisa_sig_vals."""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)

6
src/pg/.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
regression.diffs
regression.out
results/
crankshaft--dev.sql
crankshaft--dev--current.sql
crankshaft--current--dev.sql

60
src/pg/Makefile Normal file
View File

@@ -0,0 +1,60 @@
include ../../Makefile.global
# Development tasks:
#
# * install generates the control & script files into src/pg/
# and installs then into the PostgreSQL extensions directory;
# requires sudo. In additionof the current development version
# named 'dev', an alias 'current' is generating for ease of
# update (upgrade to 'current', then to 'dev').
# the python module is installed in a virtualenv in envs/dev/
# * test runs the tests for the currently generated Development
# extension.
DATA = $(EXTENSION)--dev.sql \
$(EXTENSION)--current--dev.sql \
$(EXTENSION)--dev--current.sql
SOURCES_DATA_DIR = sql
SOURCES_DATA = $(wildcard $(SOURCES_DATA_DIR)/*.sql)
VIRTUALENV_PATH = $(realpath ../../envs)
ESC_VIRVIRTUALENV_PATH = $(subst /,\/,$(VIRTUALENV_PATH))
REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g' \
-e 's/@@VIRTUALENV_PATH@@/$(ESC_VIRVIRTUALENV_PATH)/g'
$(DATA): $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > $@
TEST_DIR = test
REGRESS = $(notdir $(basename $(wildcard $(TEST_DIR)/sql/*test.sql)))
REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
# This seems to be needed at least for PG 9.3.11
all: $(DATA)
test: export PGUSER=postgres
test: installcheck
# Release tasks
../../release/$(EXTENSION).control: $(EXTENSION).control
cp $< $@
# Prepare new release from the currently installed development version,
# for the current version X.Y.Z (defined in the control file)
# producing the extension script and control files in releases/
# and the python package in releases/python/X.Y.Z/crankshaft/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > ../../release/$(EXTENSION)--$(EXTVERSION).sql
# Install the current relese into the PostgreSQL extensions directory
# and the Python package in a virtual environment envs/X.Y.Z
deploy:
$(INSTALL_DATA) ../../release/$(EXTENSION).control '$(DESTDIR)$(datadir)/extension/'
$(INSTALL_DATA) ../../release/*.sql '$(DESTDIR)$(datadir)/extension/'

View File

@@ -0,0 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
superuser = true
schema = cdb_crankshaft

12
src/pg/sql/01_version.sql Normal file
View File

@@ -0,0 +1,12 @@
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '@@VERSION@@'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;

23
src/pg/sql/02_py.sql Normal file
View File

@@ -0,0 +1,23 @@
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
-- RETURN '/opt/virtualenvs/crankshaft';
RETURN '@@VIRTUALENV_PATH@@';
END;
$$ language plpgsql IMMUTABLE STRICT;
-- Use the crankshaft python module
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;

View File

@@ -4,6 +4,7 @@
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;

View File

@@ -11,6 +11,7 @@ CREATE OR REPLACE FUNCTION
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
@@ -29,6 +30,7 @@ CREATE OR REPLACE FUNCTION
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)

View File

@@ -3,4 +3,4 @@ CREATE EXTENSION plpythonu;
CREATE EXTENSION postgis;
CREATE EXTENSION cartodb;
-- Install the extension
CREATE EXTENSION crankshaft;
CREATE EXTENSION crankshaft VERSION 'dev';

View File

@@ -4,4 +4,4 @@ CREATE EXTENSION postgis;
CREATE EXTENSION cartodb;
-- Install the extension
CREATE EXTENSION crankshaft;
CREATE EXTENSION crankshaft VERSION 'dev';

22
src/py/Makefile Normal file
View File

@@ -0,0 +1,22 @@
include ../../Makefile.global
# Install the package locally for development
install:
virtualenv --system-site-packages ../../envs/dev
# source ../../envs/dev/bin/activate
../../envs/dev/bin/pip install -I ./crankshaft
../../envs/dev/bin/pip install -I nose
# Test develpment install
test:
../../envs/dev/bin/nosetests crankshaft/test/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
mkdir -p ../../release/python/$(EXTVERSION)
cp -r ./$(PACKAGE) ../../release/python/$(EXTVERSION)/
$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g' ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
deploy:
virtualenv --system-site-packages $(VIRTUALENV_PATH)/$(RELEASE_VERSION)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I -U ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I nose

88
src/py/README.md Normal file
View File

@@ -0,0 +1,88 @@
# Crankshaft Python Package
...
### Run the tests
```bash
cd crankshaft
nosetests test/
```
## Notes about python dependencies
* This extension is targeted at production databases. Therefore certain restrictions must be assumed about the production environment vs other experimental environments.
* We're using `pip` and `virtualenv` to generate a suitable isolated environment for python code that has all the dependencies
* Every dependency should be:
- Added to the `setup.py` file
- Installed through it
- Tested, when they have a test suite.
- Fixed in the `requirements.txt`
* At present we use Python version 2.7.3
---
To avoid troublesome compilations/linkings we will use
the available system package `python-scipy`.
This package and its dependencies provide numpy 1.6.1
and scipy 0.9.0. To be able to use these versions we cannot
PySAL 1.10 or later, so we'll stick to 1.9.1.
```
apt-get install -y python-scipy
```
We'll use virtual environments to install our packages,
but configued to use also system modules so that the
mentioned scipy and numpy are used.
# Create a virtual environment for python
$ virtualenv --system-site-packages dev
# Activate the virtualenv
$ source dev/bin/activate
# Install all the requirements
# expect this to take a while, as it will trigger a few compilations
(dev) $ pip install -I ./crankshaft
#### Test the libraries with that virtual env
##### Test numpy library dependency:
import numpy
numpy.test('full')
##### Run scipy tests
import scipy
scipy.test('full')
##### Testing pysal
See [http://pysal.readthedocs.org/en/latest/developers/testing.html]
This will require putting this into `dev/lib/python2.7/site-packages/setup.cfg`:
```
[nosetests]
ignore-files=collection
exclude-dir=pysal/contrib
[wheel]
universal=1
```
And copying some files before executing the tests:
(we'll use a temporary directory from where the tests will be executed because
some tests expect some files in the current directory). Next must be executed
from
```
cp dev/lib/python2.7/site-packages/pysal/examples/geodanet/* dev/local/lib/python2.7/site-packages/pysal/examples
mkdir -p test_tmp && cd test_tmp && cp ../dev/lib/python2.7/site-packages/pysal/examples/geodanet/* ./
```
Then, execute the tests with:
import pysal
import nose
nose.runmodule('pysal')

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1 @@
from moran import *

View File

@@ -0,0 +1,321 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import numpy as np
import pysal as ps
import plpy
# High level interface ---------------------------------------
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
# TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
y = get_attributes(r, 1)
w = get_weight(r, w_type)
# calculate LISA values
lisa = ps.Moran_Local(y, w)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I Local Rate
Andy Eschbacher
"""
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
plpy.notice('r.nrows() = %d' % r.nrows())
## collect attributes
numer = get_attributes(r, 1)
denom = get_attributes(r, 2)
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
## TODO: Decide on which return values here
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"table": t,
"geom_col": geom_column,
"id_col": id_col}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
## collect attributes
attr1_vals = get_attributes(r, 1)
attr2_vals = get_attributes(r, 2)
# create weights
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
:param coord (int): quadrant of a specific measurement
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
:param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
:param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE %(attr_where_j)s " \
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs} OFFSET 1 ) " \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
:param params: dict of information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_query(w_type, query_vals):
"""Return requested query.
:param w_type: type of neighbors to calculate (knn or queen)
:param query_vals: values used to construct the query
"""
if w_type == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
def get_attributes(query_res, attr_num):
"""
:param query_res: query results with attributes and neighbors
:param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
## Build weight object
def get_weight(query_res, w_type='queen', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
:param query_res: query results with attributes and neighbors
"""
if w_type == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
elif w_type == 'queen':
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def quad_position(quads):
"""
Produce Moran's I classification based of n
"""
lisa_sig = np.array([map_quads(q) for q in quads])
return lisa_sig
def lisa_sig_vals(pvals, quads, threshold):
"""
Produce Moran's I classification based of n
"""
sig = (pvals <= threshold)
lisa_sig = np.empty(len(sig), np.chararray)
for idx, val in enumerate(sig):
if val:
lisa_sig[idx] = map_quads(quads[idx])
else:
lisa_sig[idx] = 'Not significant'
return lisa_sig

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.0',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.9.1'],
requires=['pysal', 'numpy' ],
test_suite='test'
)

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "Not significant"],
[0.6152779669180425, "Not significant"],
[-0.14657336660125297, "Not significant"],
[0.6967858120189607, "Not significant"],
[0.07949310115714454, "Not significant"],
[0.4703198759258987, "Not significant"],
[0.4421125200498064, "Not significant"],
[0.5724288737143592, "Not significant"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "Not significant"],
[-0.01466729201304962, "Not significant"],
[0.3481559372544409, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "Not significant"],
[0.15971286468915544, "Not significant"],
[1.0543588860308968, "Not significant"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "Not significant"],
[0.08438455015300014, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "Not significant"],
[0.795275137550483, "Not significant"],
[0.18562939195219, "LL"],
[0.3010757406693439, "Not significant"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "Not significant"],
[-0.07116352791516614, "Not significant"],
[-0.09945240794119009, "Not significant"],
[0.18562939195219, "LL"],
[0.1832733440191868, "Not significant"],
[-0.39054253768447705, "Not significant"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "Not significant"],
[0.2584386102554792, "Not significant"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "Not significant"],
[0.051367269430983485, "Not significant"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "Not significant"],
[0.04191046803899837, "Not significant"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "Not significant"],
[0.5410013139159929, "Not significant"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,144 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"table": "a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads."""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_query_attr_select(self):
"""Test query_attr_select."""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(cc.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test query_attr_where."""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(cc.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
"BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
"i.\"cartodb_id\" ASC;"
self.assertEqual(cc.queen(self.params), ans)
def test_get_query(self):
"""Test get_query."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.get_query('knn', self.params), ans)
def test_get_attributes(self):
"""Test get_attributes."""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight."""
self.assertEqual(True, True)
def test_quad_position(self):
"""Test lisa_sig_vals."""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)