Compare commits

...

21 Commits

Author SHA1 Message Date
Javier Goizueta
8762f6ca1c Merge pull request #12 from CartoDB/feat-moran-free-queries
Allow to pass free queries as `select * from table limit 100` in moran
2016-03-16 19:43:15 +01:00
Raul Ochoa
58c141d217 Allow to pass free queries as select * from table limit 100 in moran 2016-03-16 19:40:06 +01:00
Javier Goizueta
5a7d3178dd Release 0.0.2
This version is the first with the new versioning approach
which uses separate per-version Pyhton virtual enironments.
2016-03-16 19:22:21 +01:00
Javier Goizueta
4903af6cdc Add existing release 0.0.1
The existing 0.0.1 files are placed into their location in release/
2016-03-16 18:41:49 +01:00
Javier Goizueta
692014d694 Merge pull request #11 from CartoDB/new-versioning-package-varenv
New versioning process (with multiple virtual environments)
2016-03-16 18:21:52 +01:00
Javier Goizueta
47e0253652 Fixes to the documentation 2016-03-16 18:18:59 +01:00
Javier Goizueta
9f03a9b075 Reorganize the documentation into separate files
Keep a "Quickstart Guide" in the README, add separate
detailed sections for development (CONTRIBUTING) and
release/deployment (RELEASE).
2016-03-16 17:42:28 +01:00
Javier Goizueta
b5281d0681 Documentation clarifications and corrections. 2016-03-16 17:19:21 +01:00
Javier Goizueta
689ec8a925 Change version function from IMMUTABLE to STABLE
These functions' results will change when the extension
is updated.
2016-03-16 17:09:50 +01:00
Javier Goizueta
a7e42e93cc Rename cdb_crankshaft_internal_version as internal function 2016-03-16 16:41:54 +01:00
Javier Goizueta
bad09ffd7b Remove abandoned alternatives from the documentation 2016-03-16 16:30:03 +01:00
Javier Goizueta
4706442a1d Add documentation about useful make targets 2016-03-16 15:56:19 +01:00
Javier Goizueta
935c7f9963 Add missing Makefile comment 2016-03-16 15:54:39 +01:00
Javier Goizueta
ef3bcaeee8 Restore commented-out make target 2016-03-16 15:52:47 +01:00
Javier Goizueta
4ffb2c9664 Review and fix the documentation 2016-03-16 15:45:13 +01:00
Javier Goizueta
dea6e2f1a7 Refactor the Makefile
Separate concerns properly for each subdirectory's Makefile
2016-03-16 15:40:40 +01:00
Javier Goizueta
d13f167d47 Add RELEASE_VERSION option to make deploy
Now make deploy installs by default the current version,
but can be made to install any prior specific version using
a environmnt varialbe RELEASE_VERSION
2016-03-16 14:38:18 +01:00
Javier Goizueta
a518034e65 Fix .pyc files need not only be ignored inside src/py 2016-03-16 11:13:26 +01:00
Javier Goizueta
24e4037995 Fix version number of released extension script 2016-03-16 11:11:16 +01:00
Javier Goizueta
82a738fe40 Fix make clean tasks 2016-03-16 10:18:07 +01:00
Javier Goizueta
e801c9cb60 Release tasks using release-specific virtual environments
Refine the development process and define the procedure for
releasing new versions.
2016-03-15 18:48:46 +01:00
48 changed files with 2212 additions and 352 deletions

View File

@@ -1,2 +1,2 @@
envs/
*.pyc
dev/

91
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,91 @@
# Development process
Please read the Working Process/Quickstart Guide in README.md first.
For any modification of crankshaft, such as adding new features,
refactoring or bug-fixing, topic branch must be created out of the `develop`
branch and be used for the development process.
Modifications are done inside `src/pg/sql` and `src/py/crankshaft`.
Take into account:
* Tests must be added for any new functionality
(inside `src/pg/test`, `src/py/crankshaft/test`) as well as to
detect any bugs that are being fixed.
* Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
* Convention: snake case(i.e. `snake_case` and not `CamelCase`)
shall be used for all function names.
Prefix function names intended for public use with `cdb_`
and private functions (to be used only internally inside
the extension) with `_cdb_`.
Once the code is ready to be tested, update the local development installation
with `sudo make install`.
This will update the 'dev' version of the extension in `src/pg/` and
make it available to PostgreSQL.
It will also install the python package (crankshaft) in a virtual
environment `env/dev`.
The version number of the Python package, defined in
`src/pg/crankshaft/setup.py` will be overridden when
the package is released and always match the extension version number,
but for development it shall be kept as '0.0.0'.
Run the tests with `make test`.
To use the python extension for custom tests, activate the virtual
environment with:
```
source envs/dev/bin/activate
```
Update extension in a working database with:
* `ALTER EXTENSION crankshaft VERSION TO 'current';`
`ALTER EXTENSION crankshaft VERSION TO 'dev';`
Note: we keep the current development version install as 'dev' always;
we update through the 'current' alias to allow changing the extension
contents but not the version identifier. This will fail if the
changes involve incompatible function changes such as a different
return type; in that case the offending function (or the whole extension)
should be dropped manually before the update.
If the extension has not previously been installed in a database,
it can be installed directly with:
* `CREATE EXTENSION crankshaft WITH VERSION 'dev';`
Note: the development extension uses the development python virtual
environment automatically.
Before proceeding to the release process peer code reviewing of the code is
a must.
Once the feature or bugfix is completed and all the tests are passing
a Pull-Request shall be created on the topic branch, reviewed by a peer
and then merged back into the `develop` branch when all CI tests pass.
When the changes in the `develop` branch are to be released in a new
version of the extension, a PR must be created on the `develop` branch.
The release manage will take hold of the PR at this moment to proceed
to the release process for a new revision of the extension.
## Relevant development tasks available in the Makefile
```
* `make help` show a short description of the available targets
* `sudo make install` will generate the extension scripts for the development
version ('dev'/'current') and install the python package into the
development virtual environment `envs/dev`.
Intended for use by developers.
* `make test` will run the tests for the installed development extension.
Intended for use by developers.
```

View File

@@ -1,43 +0,0 @@
# Workflow
... (branching/merging flow)
# Deployment
...
Deployment to db servers: the next command will install both the Python
package and the extension.
```
sudo make install
```
Installing only the Python package:
```
sudo pip install python/crankshaft --upgrade
```
Caveat: note that `pip install ./crankshaft` will install
from local files, but `pip install crankshaft` will not.
CI: Install and run the tests on the installed extension and package:
```
(sudo make install && PGUSER=postgres make testinstalled)
```
Installing the extension in user databases:
Once installed in a server, the extension can be added
to a database with the next SQL command:
```
CREATE EXTENSION crankshaft;
```
To upgrade the extension to an specific version X.Y.Z:
```
ALTER EXTENSION crankshaft UPGRADE TO 'X.Y.Z';
```

View File

@@ -1,13 +1,70 @@
include ./Makefile.global
EXT_DIR = src/pg
PYP_DIR = src/py
.PHONY: install
.PHONY: run_tests
.PHONY: release
.PHONY: deploy
install:
# Generate and install developmet versions of the extension
# and python package.
# The extension is named 'dev' with a 'current' alias for easily upgrading.
# The Python package is installed in a virtual environment envs/dev/
# Requires sudo.
install: ## Generate and install development version of the extension; requires sudo.
$(MAKE) -C $(PYP_DIR) install
$(MAKE) -C $(EXT_DIR) install
testinstalled:
$(MAKE) -C $(PYP_DIR) testinstalled
$(MAKE) -C $(EXT_DIR) installcheck
# Run the tests for the installed development extension and
# python package
test: ## Run the tests for the development version of the extension
$(MAKE) -C $(PYP_DIR) test
$(MAKE) -C $(EXT_DIR) test
# Generate a new release into release
release: ## Generate a new release of the extension. Only for telease manager
$(MAKE) -C $(EXT_DIR) release
$(MAKE) -C $(PYP_DIR) release
# Install the current release.
# The Python package is installed in a virtual environment envs/X.Y.Z/
# Requires sudo.
# Use the RELEASE_VERSION environment variable to deploy a specific version:
# sudo make deploy RELEASE_VERSION=1.0.0
deploy: ## Deploy a released extension. Only for release manager. Requires sudo.
$(MAKE) -C $(EXT_DIR) deploy
$(MAKE) -C $(PYP_DIR) deploy
# Cleanup development extension script files
clean-dev: ## clean up development extension script files
rm -f src/pg/$(EXTENSION)--*.sql
# Cleanup all releases
clean-releases: ## clean up all releases
rm -rf release/python/*
rm -f release/$(EXTENSION)--*.sql
rm -f release/$(EXTENSION).control
# Cleanup current/specific version
clean-release: ## clean up current release
rm -rf release/python/$(RELEASE_VERSION)
rm -f release/$(RELEASE_VERSION)--*.sql
# Cleanup all virtual environments
clean-environments: ## clean up all virtual environments
rm -rf envs/*
clean-all: clean-dev clean-release clean-environments
help:
@IFS=$$'\n' ; \
help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \
for help_line in $${help_lines[@]}; do \
IFS=$$'#' ; \
help_split=($$help_line) ; \
help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
printf "%-30s %s\n" $$help_command $$help_info ; \
done

6
Makefile.global Normal file
View File

@@ -0,0 +1,6 @@
SELF_DIR := $(dir $(lastword $(MAKEFILE_LIST)))
EXTENSION = crankshaft
PACKAGE = crankshaft
EXTVERSION = $(shell grep default_version $(SELF_DIR)/src/pg/$(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
RELEASE_VERSION ?= $(EXTVERSION)
SED = sed

7
NEWS.md Normal file
View File

@@ -0,0 +1,7 @@
0.0.2 (2016-03-16)
------------------
* New versioning approach using per-version Python virtual environments
0.0.1 (2016-02-22)
------------------
* Preliminar release

109
README.md
View File

@@ -9,94 +9,63 @@ CartoDB Spatial Analysis extension for PostgreSQL.
* - *src/pg* contains the PostgreSQL extension source code
* - *src/py* Python module source code
* *release* reseleased versions
* *env* base directory for Python virtual environments
## Requirements
* pip, virtualenv, PostgreSQL
* python-scipy system package (see src/py/README.md)
# Working Process
# Working Process -- Quickstart Guide
## Development
We distinguish two roles regarding the development cycle of crankshaft:
Work in `src/pg/sql`, `src/py/crankshaft`;
use a topic branch. See src/py/README.md
for the procedure to work with the Python local environment.
* *developers* will implement new functionality and bugfixes into
the codebase and will request for new releases of the extension.
* A *release manager* will attend these requests and will handle
the release process. The release process is sequential:
no concurrent releases will ever be in the works.
Take into account:
We use the default `develop` branch as the basis for development.
The `master` branch is used to merge and tag releases to be
deployed in production.
* Always remember to add tests for any new functionality
documentation.
* Add or modify the corresponding documentation files in the `doc` folder.
Since we expect to have highly technical functions here, an extense
background explanation would be of great help to users of this extension.
* Convention: Use snake case (i.e. `snake_case` and not `CamelCase`) for all
functions. Prefix functions intended for public use with `cdb_`
and private functions (to be used only internally inside
the extension) with `_cdb_`.
Developers shall create a new topic branch from `develop` for any new feature
or bugfix and commit their changes to it and eventually merge back into
the `develop` branch. When a new release is required a Pull Request
will be open againt the `develop` branch.
Update local installation with `sudo make install`
(this will update the 'dev' version of the extension in 'src/pg/')
The `develop` pull requests will be handled by the release manage,
who will merge into master where new releases are prepared and tagged.
The `master` branch is the sole responsibility of the release masters
and developers must not commit or merge into it.
Run the tests with `PGUSER=postgres make test`
## Development Guidelines
Update extension in working database with
For a detailed description of the development process please see
the CONTRIBUTING.md guide.
* `ALTER EXTENSION crankshaft VERSION TO 'current';`
`ALTER EXTENSION crankshaft VERSION TO 'dev';`
Any modification to the source code (`src/pg/sql` for the SQL extension,
`src/py/crankshaft` for the Python package) shall always be done
in a topic branch created from the `develop` branch.
Note: we keep the current development version install as 'dev' always;
we update through the 'current' alias to allow changing the extension
contents but not the version identifier. This will fail if the
changes involve incompatible function changes such as a different
return type; in that case the offending function (or the whole extension)
should be dropped manually before the update.
Tests, documentation and peer code reviewing are required for all
modifications.
If the extension has not previously been installed in a database
we can:
The tests (both for SQL and Pyhton) are executed by running,
from the top directory:
* `CREATE EXTENSION crankshaft WITH VERSION 'dev';`
Once the tests are succeeding a new Pull-Request can be created.
CI-tests must be checked to be successfull.
Before merging a topic branch peer code reviewing of the code is a must.
```
sudo make install
make test
```
To request a new release, which will be handled by them
release manager, a Pull Request must be created in the `develop`
branch.
## Release
The release process of a new version of the extension
shall by performed by the designated *Release Manager*.
Note that we expect to gradually automate this process.
Having checkout the topic branch of the PR to be released:
The version number in `pg/cranckshaft.control` must first be updated.
To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
We now will explain the process for the case of backwards-compatible
releases (updating the minor or patch version numbers).
TODO: document the complex case of major releases.
The next command must be executed to produce the main installation
script for the new release, `release/cranckshaft--X.Y.Z.sql`.
```
make release
```
Then, the release manager shall produce upgrade and downgrade scripts
to migrate to/from the previous release. In the case of minor/patch
releases this simply consist in extracting the functions that have changed
and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
file.
TODO: configure the local enviroment to be used by the release;
currently should be directory `src/py/X.Y.Z`, but this must be fixed;
a possibility to explore is to use the `cdb_conf` table.
TODO: testing procedure for the new release
TODO: push, merge, tag, deploy procedures.
The release and deployment process is described in the
RELEASE.md guide and it is the responsibility of the designated
release manager.

93
RELEASE.md Normal file
View File

@@ -0,0 +1,93 @@
# Release & Deployment Process
Please read the Working Process/Quickstart Guide in README.md
and the Development guidelines in CONTRIBUTING.md.
The release process of a new version of the extension
shall be performed by the designated *Release Manager*.
Note that we expect to gradually automate more of this process.
Having checked PR to be released it shall be
merged back into the `master` branch to prepare the new release.
The version number in `pg/cranckshaft.control` must first be updated.
To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
Thew `NEWS.md` will be updated.
We now will explain the process for the case of backwards-compatible
releases (updating the minor or patch version numbers).
TODO: document the complex case of major releases.
The next command must be executed to produce the main installation
script for the new release, `release/cranckshaft--X.Y.Z.sql` and
also to copy the python package to `release/python/X.Y.Z/crankshaft`.
```
make release
```
Then, the release manager shall produce upgrade and downgrade scripts
to migrate to/from the previous release. In the case of minor/patch
releases this simply consist in extracting the functions that have changed
and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
file.
The new release can be deployed for staging/smoke tests with this command:
```
sudo make deploy
```
This will copy the current 'X.Y.Z' released version of the extension to
PostgreSQL. The corresponding Python extension will be installed in a
virtual environment in `envs/X.Y.Z`.
It can be activated with:
```
source envs/X.Y.Z/bin/activate
```
But note that this is needed only for using the package directly;
the 'X.Y.Z' version of the extension will automatically use the
python package from this virtual environment.
The `sudo make deploy` operation can be also used for installing
the new version after it has been released.
To install a specific version 'X.Y.Z' different from the current one
(which must be present in `releases/`) you can:
```
sudo make deploy RELEASE_VERSION=X.Y.Z
```
TODO: testing procedure for the new release.
TODO: procedure for staging deployment.
TODO: procedure for merging to master, tagging and deploying
in production.
## Relevant release & deployment tasks available in the Makefile
```
* `make help` show a short description of the available targets
* `make release` will generate a new release (version number defined in
`src/pg/crankshaft.control`) into `release/`.
Intended for use by the release manager.
* `sudo make deploy` will install the current release X.Y.Z from the
`release/` files into PostgreSQL and a Python virtual environment
`envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
* `sudo make deploy RELEASE_VERSION=X.Y.Z` will install specified version
previously generated in `release/`
into PostgreSQL and a Python virtual environment `envs/X.Y.Z`.
Intended for use by the release manager and deployment jobs.
```

View File

@@ -1,9 +0,0 @@
* [x] Support versioning
* [x] Test use of `plpy` from python Package
* [x] Add `pysal` etc. dependencies
* [x] Define documentation practices (general, per extension/package?)
* [x] Add initial function set (WIP)
* Unify style of function comments
* [x] Add integration tests
* Make target to open a new version development (create symlinks, etc.)
* [x] Should add cartodb ext. as a dependency?

0
release/.gitignore vendored Normal file
View File

View File

@@ -0,0 +1,74 @@
CREATE OR REPLACE FUNCTION cdb_crankshaft.cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.2'::text;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
RETURN '/home/ubuntu/crankshaft/envs';
END;
$$ language plpgsql IMMUTABLE STRICT;
CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;

View File

@@ -0,0 +1,148 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate
CREATE OR REPLACE FUNCTION
cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -0,0 +1,44 @@
CREATE OR REPLACE FUNCTION
cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
CREATE OR REPLACE FUNCTION
cdb_crankshaft.cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
DROP FUNCTION IF EXISTS cdb_crankshaft.cdb_crankshaft_version();
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_internal_version();
DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_activate_py();

View File

@@ -0,0 +1,186 @@
--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
-- Version number of the extension release
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '0.0.2'::text;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' STABLE STRICT;
CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
RETURNS text
AS $$
BEGIN
-- RETURN '/opt/virtualenvs/crankshaft';
RETURN '/home/ubuntu/crankshaft/envs';
END;
$$ language plpgsql IMMUTABLE STRICT;
-- Use the crankshaft python module
CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
RETURNS VOID
AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)
activate_path = venv_path + '/bin/activate_this.py'
exec(open(activate_path).read(), dict(__file__=activate_path))
$$ LANGUAGE plpythonu;
-- Internal function.
-- Set the seeds of the RNGs (Random Number Generators)
-- used internally.
CREATE OR REPLACE FUNCTION
_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft import random_seeds
random_seeds.set_random_seeds(seed_value)
$$ LANGUAGE plpythonu;
-- Moran's I
CREATE OR REPLACE FUNCTION
cdb_moran_local (
t TEXT,
attr TEXT,
significance float DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local
# TODO: use named parameters or a dictionary
return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Moran's I Local Rate
CREATE OR REPLACE FUNCTION
cdb_moran_local_rate(t TEXT,
numerator TEXT,
denominator TEXT,
significance FLOAT DEFAULT 0.05,
num_ngbrs INT DEFAULT 5,
permutations INT DEFAULT 99,
geom_column TEXT DEFAULT 'the_geom',
id_col TEXT DEFAULT 'cartodb_id',
w_type TEXT DEFAULT 'knn')
RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
AS $$
plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
from crankshaft.clustering import moran_local_rate
# TODO: use named parameters or a dictionary
return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
$$ LANGUAGE plpythonu;
-- Function by Stuart Lynn for a simple interpolation of a value
-- from a polygon table over an arbitrary polygon
-- (weighted by the area proportion overlapped)
-- Aereal weighting is a very simple form of aereal interpolation.
--
-- Parameters:
-- * geom a Polygon geometry which defines the area where a value will be
-- estimated as the area-weighted sum of a given table/column
-- * target_table_name table name of the table that provides the values
-- * target_column column name of the column that provides the values
-- * schema_name optional parameter to defina the schema the target table
-- belongs to, which is necessary if its not in the search_path.
-- Note that target_table_name should never include the schema in it.
-- Return value:
-- Aereal-weighted interpolation of the column values over the geometry
CREATE OR REPLACE
FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
RETURNS numeric AS
$$
DECLARE
result numeric;
qualified_name text;
BEGIN
IF schema_name IS NULL THEN
qualified_name := Format('%I', target_table_name);
ELSE
qualified_name := Format('%I.%s', schema_name, target_table_name);
END IF;
EXECUTE Format('
SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
FROM %s AS a
WHERE $1 && a.the_geom
', target_column, qualified_name)
USING geom
INTO result;
RETURN result;
END;
$$ LANGUAGE plpgsql;
--
-- Creates N points randomly distributed arround the polygon
--
-- @param g - the geometry to be turned in to points
--
-- @param no_points - the number of points to generate
--
-- @params max_iter_per_point - the function generates points in the polygon's bounding box
-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
-- misses per point the funciton accepts before giving up.
--
-- Returns: Multipoint with the requested points
CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
RETURNS GEOMETRY AS $$
DECLARE
extent GEOMETRY;
test_point Geometry;
width NUMERIC;
height NUMERIC;
x0 NUMERIC;
y0 NUMERIC;
xp NUMERIC;
yp NUMERIC;
no_left INTEGER;
remaining_iterations INTEGER;
points GEOMETRY[];
bbox_line GEOMETRY;
intersection_line GEOMETRY;
BEGIN
extent := ST_Envelope(geom);
width := ST_XMax(extent) - ST_XMIN(extent);
height := ST_YMax(extent) - ST_YMIN(extent);
x0 := ST_XMin(extent);
y0 := ST_YMin(extent);
no_left := no_points;
LOOP
if(no_left=0) THEN
EXIT;
END IF;
yp = y0 + height*random();
bbox_line = ST_MakeLine(
ST_SetSRID(ST_MakePoint(yp, x0),4326),
ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
);
intersection_line = ST_Intersection(bbox_line,geom);
test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
points := points || test_point;
no_left = no_left - 1 ;
END LOOP;
RETURN ST_Collect(points);
END;
$$
LANGUAGE plpgsql VOLATILE;
-- Make sure by default there are no permissions for publicuser
-- NOTE: this happens at extension creation time, as part of an implicit transaction.
-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
-- Grant permissions on the schema to publicuser (but just the schema)
GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
-- Revoke execute permissions on all functions in the schema by default
-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;

View File

@@ -0,0 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
superuser = true
schema = cdb_crankshaft

0
release/python/.gitignore vendored Normal file
View File

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1 @@
from moran import *

View File

@@ -0,0 +1,321 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import numpy as np
import pysal as ps
import plpy
# High level interface ---------------------------------------
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
# TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
y = get_attributes(r, 1)
w = get_weight(r, w_type)
# calculate LISA values
lisa = ps.Moran_Local(y, w)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I Local Rate
Andy Eschbacher
"""
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
plpy.notice('r.nrows() = %d' % r.nrows())
## collect attributes
numer = get_attributes(r, 1)
denom = get_attributes(r, 2)
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
## TODO: Decide on which return values here
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"table": t,
"geom_col": geom_column,
"id_col": id_col}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
## collect attributes
attr1_vals = get_attributes(r, 1)
attr2_vals = get_attributes(r, 2)
# create weights
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
:param coord (int): quadrant of a specific measurement
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
:param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
:param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE %(attr_where_j)s " \
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs} OFFSET 1 ) " \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
:param params: dict of information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_query(w_type, query_vals):
"""Return requested query.
:param w_type: type of neighbors to calculate (knn or queen)
:param query_vals: values used to construct the query
"""
if w_type == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
def get_attributes(query_res, attr_num):
"""
:param query_res: query results with attributes and neighbors
:param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
## Build weight object
def get_weight(query_res, w_type='queen', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
:param query_res: query results with attributes and neighbors
"""
if w_type == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
elif w_type == 'queen':
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def quad_position(quads):
"""
Produce Moran's I classification based of n
"""
lisa_sig = np.array([map_quads(q) for q in quads])
return lisa_sig
def lisa_sig_vals(pvals, quads, threshold):
"""
Produce Moran's I classification based of n
"""
sig = (pvals <= threshold)
lisa_sig = np.empty(len(sig), np.chararray)
for idx, val in enumerate(sig):
if val:
lisa_sig[idx] = map_quads(quads[idx])
else:
lisa_sig[idx] = 'Not significant'
return lisa_sig

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.01',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.11.0','numpy==1.6.1','scipy==0.17.0'],
requires=['pysal', 'numpy'],
test_suite='test'
)

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "Not significant"],
[0.6152779669180425, "Not significant"],
[-0.14657336660125297, "Not significant"],
[0.6967858120189607, "Not significant"],
[0.07949310115714454, "Not significant"],
[0.4703198759258987, "Not significant"],
[0.4421125200498064, "Not significant"],
[0.5724288737143592, "Not significant"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "Not significant"],
[-0.01466729201304962, "Not significant"],
[0.3481559372544409, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "Not significant"],
[0.15971286468915544, "Not significant"],
[1.0543588860308968, "Not significant"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "Not significant"],
[0.08438455015300014, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "Not significant"],
[0.795275137550483, "Not significant"],
[0.18562939195219, "LL"],
[0.3010757406693439, "Not significant"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "Not significant"],
[-0.07116352791516614, "Not significant"],
[-0.09945240794119009, "Not significant"],
[0.18562939195219, "LL"],
[0.1832733440191868, "Not significant"],
[-0.39054253768447705, "Not significant"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "Not significant"],
[0.2584386102554792, "Not significant"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "Not significant"],
[0.051367269430983485, "Not significant"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "Not significant"],
[0.04191046803899837, "Not significant"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "Not significant"],
[0.5410013139159929, "Not significant"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,144 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"table": "a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads."""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_query_attr_select(self):
"""Test query_attr_select."""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(cc.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test query_attr_where."""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(cc.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
"BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
"i.\"cartodb_id\" ASC;"
self.assertEqual(cc.queen(self.params), ans)
def test_get_query(self):
"""Test get_query."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.get_query('knn', self.params), ans)
def test_get_attributes(self):
"""Test get_attributes."""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight."""
self.assertEqual(True, True)
def test_quad_position(self):
"""Test lisa_sig_vals."""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)

View File

@@ -0,0 +1,2 @@
import random_seeds
import clustering

View File

@@ -0,0 +1 @@
from moran import *

View File

@@ -0,0 +1,321 @@
"""
Moran's I geostatistics (global clustering & outliers presence)
"""
# TODO: Fill in local neighbors which have null/NoneType values with the
# average of the their neighborhood
import numpy as np
import pysal as ps
import plpy
# High level interface ---------------------------------------
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
"""
# TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
# TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
y = get_attributes(r, 1)
w = get_weight(r, w_type)
# calculate LISA values
lisa = ps.Moran_Local(y, w)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I Local Rate
Andy Eschbacher
"""
plpy.notice('** Constructing query')
# geometries with attributes that are null are ignored
# resulting in a collection of not as near neighbors
qvals = {"id_col": id_col,
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_column,
"table": t,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
plpy.notice('r.nrows() = %d' % r.nrows())
## collect attributes
numer = get_attributes(r, 1)
denom = get_attributes(r, 2)
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
# find units of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
## TODO: Decide on which return values here
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
plpy.notice('** Constructing query')
qvals = {"num_ngbrs": num_ngbrs,
"attr1": attr1,
"attr2": attr2,
"table": t,
"geom_col": geom_column,
"id_col": id_col}
q = get_query(w_type, qvals)
try:
r = plpy.execute(q)
plpy.notice('** Query returned with %d rows' % len(r))
except plpy.SPIError:
plpy.notice('** Query failed: "%s"' % q)
plpy.notice('** Error: %s' % plpy.SPIError)
plpy.notice('** Exiting function')
return zip([None], [None], [None], [None])
## collect attributes
attr1_vals = get_attributes(r, 1)
attr2_vals = get_attributes(r, 2)
# create weights
w = get_weight(r, w_type, num_ngbrs)
# calculate LISA values
lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
plpy.notice("len of Is: %d" % len(lisa.Is))
# find clustering of significance
lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
plpy.notice('** Finished calculations')
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
# Low level functions ----------------------------------------
def map_quads(coord):
"""
Map a quadrant number to Moran's I designation
HH=1, LH=2, LL=3, HL=4
Input:
:param coord (int): quadrant of a specific measurement
"""
if coord == 1:
return 'HH'
elif coord == 2:
return 'LH'
elif coord == 3:
return 'LL'
elif coord == 4:
return 'HL'
else:
return None
def query_attr_select(params):
"""
Create portion of SELECT statement for attributes inolved in query.
:param params: dict of information used in query (column names,
table name, etc.)
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
attr_string = ""
for idx, val in enumerate(sorted(attrs)):
attr_string += template % {"col": val, "alias_num": idx + 1}
return attr_string
def query_attr_where(params):
"""
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
attr_string = []
for attr in attrs:
attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
if len(attrs) == 2:
attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
out = " AND ".join(attr_string)
return out
def knn(params):
"""SQL query for k-nearest neighbors.
:param vars: dict of values to fill template
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE %(attr_where_j)s " \
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs} OFFSET 1 ) " \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## SQL query for finding queens neighbors (all contiguous polygons)
def queen(params):
"""SQL query for queen neighbors.
:param params: dict of information to fill query
"""
attr_select = query_attr_select(params)
attr_where = query_attr_where(params)
replacements = {"attr_select": attr_select,
"attr_where_i": attr_where.replace("idx_replace", "i"),
"attr_where_j": attr_where.replace("idx_replace", "j")}
query = "SELECT " \
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM \"{table}\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
return query.format(**params)
## to add more weight methods open a ticket or pull request
def get_query(w_type, query_vals):
"""Return requested query.
:param w_type: type of neighbors to calculate (knn or queen)
:param query_vals: values used to construct the query
"""
if w_type == 'knn':
return knn(query_vals)
else:
return queen(query_vals)
def get_attributes(query_res, attr_num):
"""
:param query_res: query results with attributes and neighbors
:param attr_num: attribute number (1, 2, ...)
"""
return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
## Build weight object
def get_weight(query_res, w_type='queen', num_ngbrs=5):
"""
Construct PySAL weight from return value of query
:param query_res: query results with attributes and neighbors
"""
if w_type == 'knn':
row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
weights = {x['id']: row_normed_weights for x in query_res}
elif w_type == 'queen':
weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
if len(x['neighbors']) > 0
else [] for x in query_res}
neighbors = {x['id']: x['neighbors'] for x in query_res}
return ps.W(neighbors, weights)
def quad_position(quads):
"""
Produce Moran's I classification based of n
"""
lisa_sig = np.array([map_quads(q) for q in quads])
return lisa_sig
def lisa_sig_vals(pvals, quads, threshold):
"""
Produce Moran's I classification based of n
"""
sig = (pvals <= threshold)
lisa_sig = np.empty(len(sig), np.chararray)
for idx, val in enumerate(sig):
if val:
lisa_sig[idx] = map_quads(quads[idx])
else:
lisa_sig[idx] = 'Not significant'
return lisa_sig

View File

@@ -0,0 +1,10 @@
import random
import numpy
def set_random_seeds(value):
"""
Set the seeds of the RNGs (Random Number Generators)
used internally.
"""
random.seed(value)
numpy.random.seed(value)

View File

@@ -0,0 +1,48 @@
"""
CartoDB Spatial Analysis Python Library
See:
https://github.com/CartoDB/crankshaft
"""
from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.2',
description='CartoDB Spatial Analysis Python Library',
url='https://github.com/CartoDB/crankshaft',
author='Data Services Team - CartoDB',
author_email='dataservices@cartodb.com',
license='MIT',
classifiers=[
'Development Status :: 3 - Alpha',
'Intended Audience :: Mapping comunity',
'Topic :: Maps :: Mapping Tools',
'License :: OSI Approved :: MIT License',
'Programming Language :: Python :: 2.7',
],
keywords='maps mapping tools spatial analysis geostatistics',
packages=find_packages(exclude=['contrib', 'docs', 'tests']),
extras_require={
'dev': ['unittest'],
'test': ['unittest', 'nose', 'mock'],
},
# The choice of component versions is dictated by what's
# provisioned in the production servers.
install_requires=['pysal==1.9.1'],
requires=['pysal', 'numpy' ],
test_suite='test'
)

View File

@@ -0,0 +1,52 @@
[[0.9319096128346788, "HH"],
[-1.135787401862846, "HL"],
[0.11732030672508517, "Not significant"],
[0.6152779669180425, "Not significant"],
[-0.14657336660125297, "Not significant"],
[0.6967858120189607, "Not significant"],
[0.07949310115714454, "Not significant"],
[0.4703198759258987, "Not significant"],
[0.4421125200498064, "Not significant"],
[0.5724288737143592, "Not significant"],
[0.8970743435692062, "LL"],
[0.18327334401918674, "Not significant"],
[-0.01466729201304962, "Not significant"],
[0.3481559372544409, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329988, "HH"],
[0.4373841193538136, "Not significant"],
[0.15971286468915544, "Not significant"],
[1.0543588860308968, "Not significant"],
[1.7372866900020818, "HH"],
[1.091998586053999, "LL"],
[0.1171572584252222, "Not significant"],
[0.08438455015300014, "Not significant"],
[0.06547094736902978, "Not significant"],
[0.15482141569329985, "HH"],
[1.1627044812890683, "HH"],
[0.06547094736902978, "Not significant"],
[0.795275137550483, "Not significant"],
[0.18562939195219, "LL"],
[0.3010757406693439, "Not significant"],
[2.8205795942839376, "HH"],
[0.11259190602909264, "Not significant"],
[-0.07116352791516614, "Not significant"],
[-0.09945240794119009, "Not significant"],
[0.18562939195219, "LL"],
[0.1832733440191868, "Not significant"],
[-0.39054253768447705, "Not significant"],
[-0.1672071289487642, "HL"],
[0.3337669247916343, "Not significant"],
[0.2584386102554792, "Not significant"],
[-0.19733845476322634, "HL"],
[-0.9379282899805409, "LH"],
[-0.028770969951095866, "Not significant"],
[0.051367269430983485, "Not significant"],
[-0.2172548045913472, "LH"],
[0.05136726943098351, "Not significant"],
[0.04191046803899837, "Not significant"],
[0.7482357030403517, "HH"],
[-0.014585767863118111, "Not significant"],
[0.5410013139159929, "Not significant"],
[1.0223932668429925, "LL"],
[1.4179402898927476, "LL"]]

View File

@@ -0,0 +1,54 @@
[
{"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
{"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
{"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
{"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
{"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
{"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
{"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
{"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
{"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
{"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
{"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
{"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
{"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
{"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
{"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
{"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
{"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
{"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
{"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
{"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
{"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
{"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
{"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
{"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
{"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
{"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
{"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
{"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
{"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
{"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
{"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
{"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
{"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
{"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
{"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
{"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
{"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
{"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
{"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
{"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
{"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
{"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
{"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
{"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
{"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
{"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
{"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
{"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
{"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
{"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
{"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
{"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
]

View File

@@ -0,0 +1,13 @@
import unittest
from mock_plpy import MockPlPy
plpy = MockPlPy()
import sys
sys.modules['plpy'] = plpy
import os
def fixture_file(name):
dir = os.path.dirname(os.path.realpath(__file__))
return os.path.join(dir, 'fixtures', name)

View File

@@ -0,0 +1,34 @@
import re
class MockPlPy:
def __init__(self):
self._reset()
def _reset(self):
self.infos = []
self.notices = []
self.debugs = []
self.logs = []
self.warnings = []
self.errors = []
self.fatals = []
self.executes = []
self.results = []
self.prepares = []
self.results = []
def _define_result(self, query, result):
pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
self.results.append([pattern, result])
def notice(self, msg):
self.notices.append(msg)
def info(self, msg):
self.infos.append(msg)
def execute(self, query): # TODO: additional arguments
for result in self.results:
if result[0].match(query):
return result[1]
return []

View File

@@ -0,0 +1,144 @@
import unittest
import numpy as np
import unittest
# from mock_plpy import MockPlPy
# plpy = MockPlPy()
#
# import sys
# sys.modules['plpy'] = plpy
from helper import plpy, fixture_file
import crankshaft.clustering as cc
from crankshaft import random_seeds
import json
class MoranTest(unittest.TestCase):
"""Testing class for Moran's I functions."""
def setUp(self):
plpy._reset()
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"table": "a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
self.moran_data = json.loads(open(fixture_file('moran.json')).read())
def test_map_quads(self):
"""Test map_quads."""
self.assertEqual(cc.map_quads(1), 'HH')
self.assertEqual(cc.map_quads(2), 'LH')
self.assertEqual(cc.map_quads(3), 'LL')
self.assertEqual(cc.map_quads(4), 'HL')
self.assertEqual(cc.map_quads(33), None)
self.assertEqual(cc.map_quads('andy'), None)
def test_query_attr_select(self):
"""Test query_attr_select."""
ans = "i.\"{attr1}\"::numeric As attr1, " \
"i.\"{attr2}\"::numeric As attr2, "
self.assertEqual(cc.query_attr_select(self.params), ans)
def test_query_attr_where(self):
"""Test query_attr_where."""
ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" IS NOT NULL AND "\
"idx_replace.\"{attr2}\" <> 0"
self.assertEqual(cc.query_attr_where(self.params), ans)
def test_knn(self):
"""Test knn function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
"BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.knn(self.params), ans)
def test_queen(self):
"""Test queen neighbors function."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
"i.\"cartodb_id\" ASC;"
self.assertEqual(cc.queen(self.params), ans)
def test_get_query(self):
"""Test get_query."""
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
self.assertEqual(cc.get_query('knn', self.params), ans)
def test_get_attributes(self):
"""Test get_attributes."""
## need to add tests
self.assertEqual(True, True)
def test_get_weight(self):
"""Test get_weight."""
self.assertEqual(True, True)
def test_quad_position(self):
"""Test lisa_sig_vals."""
quads = np.array([1, 2, 3, 4], np.int)
ans = np.array(['HH', 'LH', 'LL', 'HL'])
test_ans = cc.quad_position(quads)
self.assertTrue((test_ans == ans).all())
def test_moran_local(self):
"""Test Moran's I local"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)
self.assertEqual(res_quad, exp_quad)
def test_moran_local_rate(self):
"""Test Moran's I rate"""
data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
plpy._define_result('select', data)
random_seeds.set_random_seeds(1234)
result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
result = [(row[0], row[1]) for row in result]
expected = self.moran_data
for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
self.assertAlmostEqual(res_val, exp_val)

View File

@@ -1,13 +1,15 @@
# Generation of a new development version 'dev' (with an alias 'current' for
# updating easily by upgrading to 'current', then 'dev')
include ../../Makefile.global
# sudo make install -- generate the 'dev' version from current source
# and make it available to PostgreSQL
# PGUSER=postgres make installcheck -- test the 'dev' extension
SED = sed
EXTENSION = crankshaft
# Development tasks:
#
# * install generates the control & script files into src/pg/
# and installs then into the PostgreSQL extensions directory;
# requires sudo. In additionof the current development version
# named 'dev', an alias 'current' is generating for ease of
# update (upgrade to 'current', then to 'dev').
# the python module is installed in a virtualenv in envs/dev/
# * test runs the tests for the currently generated Development
# extension.
DATA = $(EXTENSION)--dev.sql \
$(EXTENSION)--current--dev.sql \
@@ -16,7 +18,7 @@ DATA = $(EXTENSION)--dev.sql \
SOURCES_DATA_DIR = sql
SOURCES_DATA = $(wildcard $(SOURCES_DATA_DIR)/*.sql)
VIRTUALENV_PATH = $(realpath ../py/)
VIRTUALENV_PATH = $(realpath ../../envs)
ESC_VIRVIRTUALENV_PATH = $(subst /,\/,$(VIRTUALENV_PATH))
REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g' \
@@ -36,13 +38,23 @@ include $(PGXS)
# This seems to be needed at least for PG 9.3.11
all: $(DATA)
test: export PGUSER=postgres
test: installcheck
# WIP: goals for releasing the extension...
# Release tasks
EXTVERSION = $(shell grep default_version $(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
../release/$(EXTENSION).control: $(EXTENSION).control
../../release/$(EXTENSION).control: $(EXTENSION).control
cp $< $@
release: ../release/$(EXTENSION).control
cp $(EXTENSION)--dev.sql $(EXTENSION)--$(EXTVERSION).sql
# Prepare new release from the currently installed development version,
# for the current version X.Y.Z (defined in the control file)
# producing the extension script and control files in releases/
# and the python package in releases/python/X.Y.Z/crankshaft/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > ../../release/$(EXTENSION)--$(EXTVERSION).sql
# Install the current relese into the PostgreSQL extensions directory
# and the Python package in a virtual environment envs/X.Y.Z
deploy:
$(INSTALL_DATA) ../../release/$(EXTENSION).control '$(DESTDIR)$(datadir)/extension/'
$(INSTALL_DATA) ../../release/*.sql '$(DESTDIR)$(datadir)/extension/'

View File

@@ -1,7 +0,0 @@
# Running the tests:
```
sudo make install
PGUSER=postgres make installcheck
```

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Spatial Analysis extension'
default_version = '0.0.1'
default_version = '0.0.2'
requires = 'plpythonu, postgis, cartodb'
superuser = true
schema = cdb_crankshaft

View File

@@ -2,11 +2,11 @@
CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
RETURNS text AS $$
SELECT '@@VERSION@@'::text;
$$ language 'sql' IMMUTABLE STRICT;
$$ language 'sql' STABLE STRICT;
-- Internal identifier of the installed extension instence
-- e.g. 'dev' for current development version
CREATE OR REPLACE FUNCTION cdb_crankshaft_internal_version()
CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
RETURNS text AS $$
SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
$$ language 'sql' IMMUTABLE STRICT;
$$ language 'sql' STABLE STRICT;

View File

@@ -14,7 +14,7 @@ AS $$
import os
# plpy.notice('%',str(os.environ))
# activate virtualenv
crankshaft_version = plpy.execute('SELECT cdb_crankshaft.cdb_crankshaft_internal_version()')[0]['cdb_crankshaft_internal_version']
crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
default_venv_path = os.path.join(base_path, crankshaft_version)
venv_path = os.environ.get('CRANKSHAFT_VENV', default_venv_path)

View File

@@ -121,70 +121,18 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
JOIN cdb_crankshaft.cdb_moran_local('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.ids
ORDER BY ppoints.code;
NOTICE: ** Constructing query
CONTEXT: PL/Python function "cdb_moran_local"
NOTICE: ** Query returned with 52 rows
NOTICE: ** Query failed: "SELECT i."cartodb_id" As id, i."value"::numeric As attr1, (SELECT ARRAY(SELECT j."cartodb_id" FROM "(SELECT * FROM ppoints)" As j WHERE j."value" IS NOT NULL ORDER BY j."the_geom" <-> i."the_geom" ASC LIMIT 5 OFFSET 1 ) ) As neighbors FROM "(SELECT * FROM ppoints)" As i WHERE i."value" IS NOT NULL ORDER BY i."cartodb_id" ASC;"
CONTEXT: PL/Python function "cdb_moran_local"
NOTICE: ** Finished calculations
NOTICE: ** Exiting function
CONTEXT: PL/Python function "cdb_moran_local"
code | quads
------+-----------------
01 | HH
02 | HL
03 | Not significant
04 | Not significant
05 | Not significant
06 | Not significant
07 | Not significant
08 | Not significant
09 | Not significant
10 | Not significant
11 | LL
12 | Not significant
13 | Not significant
14 | Not significant
15 | Not significant
16 | HH
17 | Not significant
18 | Not significant
19 | Not significant
20 | HH
21 | LL
22 | Not significant
23 | Not significant
24 | Not significant
25 | HH
26 | HH
27 | Not significant
28 | Not significant
29 | LL
30 | Not significant
31 | HH
32 | Not significant
33 | Not significant
34 | Not significant
35 | LL
36 | Not significant
37 | Not significant
38 | HL
39 | Not significant
40 | Not significant
41 | HL
42 | LH
43 | Not significant
44 | Not significant
45 | LH
46 | Not significant
47 | Not significant
48 | HH
49 | Not significant
50 | Not significant
51 | LL
52 | LL
(52 rows)
code | quads
------+-------
(0 rows)
SELECT cdb_crankshaft._cdb_random_seeds(1234);
_cdb_random_seeds
@@ -194,67 +142,17 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
JOIN cdb_crankshaft.cdb_moran_local_rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.ids
ORDER BY ppoints2.code;
NOTICE: ** Constructing query
CONTEXT: PL/Python function "cdb_moran_local_rate"
NOTICE: ** Query returned with 51 rows
NOTICE: ** Query failed: "SELECT i."cartodb_id" As id, i."denominator"::numeric As attr1, i."numerator"::numeric As attr2, (SELECT ARRAY(SELECT j."cartodb_id" FROM "(SELECT * FROM ppoints2)" As j WHERE j."denominator" IS NOT NULL AND j."numerator" IS NOT NULL AND j."numerator" <> 0 ORDER BY j."the_geom" <-> i."the_geom" ASC LIMIT 5 OFFSET 1 ) ) As neighbors FROM "(SELECT * FROM ppoints2)" As i WHERE i."denominator" IS NOT NULL AND i."numerator" IS NOT NULL AND i."numerator" <> 0 ORDER BY i."cartodb_id" ASC;"
CONTEXT: PL/Python function "cdb_moran_local_rate"
NOTICE: ** Finished calculations
NOTICE: ** Error: <class 'plpy.SPIError'>
CONTEXT: PL/Python function "cdb_moran_local_rate"
code | quads
------+-----------------
01 | LL
02 | Not significant
03 | Not significant
04 | Not significant
05 | Not significant
06 | Not significant
07 | Not significant
08 | Not significant
09 | LL
10 | Not significant
11 | HH
12 | Not significant
13 | Not significant
14 | Not significant
15 | Not significant
16 | Not significant
17 | LL
18 | Not significant
19 | Not significant
20 | LL
21 | Not significant
22 | Not significant
23 | Not significant
24 | Not significant
25 | LL
26 | LL
27 | Not significant
28 | Not significant
29 | LH
30 | Not significant
31 | LL
32 | Not significant
33 | Not significant
34 | Not significant
35 | LH
36 | Not significant
37 | Not significant
38 | LH
39 | Not significant
40 | Not significant
41 | LH
42 | HL
43 | Not significant
44 | Not significant
45 | LL
46 | Not significant
47 | Not significant
48 | LL
49 | Not significant
50 | Not significant
51 | Not significant
(51 rows)
NOTICE: ** Exiting function
CONTEXT: PL/Python function "cdb_moran_local_rate"
ERROR: length of returned sequence did not match number of columns in row
CONTEXT: while creating return value
PL/Python function "cdb_moran_local_rate"

View File

@@ -8,7 +8,7 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SELECT ppoints.code, m.quads
FROM ppoints
JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
JOIN cdb_crankshaft.cdb_moran_local('SELECT * FROM ppoints', 'value') m
ON ppoints.cartodb_id = m.ids
ORDER BY ppoints.code;
@@ -16,6 +16,6 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);
SELECT ppoints2.code, m.quads
FROM ppoints2
JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
JOIN cdb_crankshaft.cdb_moran_local_rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
ON ppoints2.cartodb_id = m.ids
ORDER BY ppoints2.code;

View File

@@ -1,9 +1,22 @@
include ../../Makefile.global
# Install the package locally for development
install:
virtualenv --system-site-packages dev
./dev/bin/pip install -I ./crankshaft
./dev/bin/pip install -I nose
virtualenv --system-site-packages ../../envs/dev
# source ../../envs/dev/bin/activate
../../envs/dev/bin/pip install -I ./crankshaft
../../envs/dev/bin/pip install -I nose
# Test develpment install
testinstalled:
./dev/bin/nosetests crankshaft/test/
test:
../../envs/dev/bin/nosetests crankshaft/test/
release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
mkdir -p ../../release/python/$(EXTVERSION)
cp -r ./$(PACKAGE) ../../release/python/$(EXTVERSION)/
$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g' ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
deploy:
virtualenv --system-site-packages $(VIRTUALENV_PATH)/$(RELEASE_VERSION)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I -U ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I nose

View File

@@ -20,37 +20,29 @@ nosetests test/
---
We have two possible approaches being considered as to how manage
the Python virtual environment: using a pure virtual enviroment
or combine it with some system packages that include depencencies
for the *hard-to-compile* packages (and pin them in somewhat old versions).
To avoid troublesome compilations/linkings we will use
the available system package `python-scipy`.
This package and its dependencies provide numpy 1.6.1
and scipy 0.9.0. To be able to use these versions we cannot
PySAL 1.10 or later, so we'll stick to 1.9.1.
### Alternative A: pure virtual environment
```
apt-get install -y python-scipy
```
In this case we will install all the packages needed in the
virtual environment.
This will involve, specially for the numerical packages compiling
and linking code that uses a number of third party libraries,
and requires having theses depencencies solved for the production
environments.
We'll use virtual environments to install our packages,
but configued to use also system modules so that the
mentioned scipy and numpy are used.
#### Create and use a virtual env
We'll use a virtual enviroment directory `dev`
under the `src/pg` directory.
# Create the virtual environment for python
$ virtualenv dev
# Create a virtual environment for python
$ virtualenv --system-site-packages dev
# Activate the virtualenv
$ source dev/bin/activate
# Install all the requirements
# expect this to take a while, as it will trigger a few compilations
(dev) $ pip install -r requirements.txt
# Add a new pip to the party
(dev) $ pip install pandas
(dev) $ pip install -I ./crankshaft
#### Test the libraries with that virtual env
@@ -94,37 +86,3 @@ Then, execute the tests with:
import pysal
import nose
nose.runmodule('pysal')
### Alternative B: using some packaged modules
This option avoids troublesome compilations/linkings, at the cost
of freezing some module versions as available in system packages,
namely numpy 1.6.1 and scipy 0.9.0. (in turn, this implies
the most recent version of PySAL we can use is 1.9.1)
TODO: to use this alternative the python-scipy package must be
installed (this will have to be included in server provisioning)
```
apt-get install -y python-scipy
```
#### Create and use a virtual env
We'll use a `dev` enviroment as before, but will configure it to
use also system modules.
# Create the virtual environment for python
$ virtualenv --system-site-packages dev
# Activate the virtualenv
$ source dev/bin/activate
# Install all the requirements
# expect this to take a while, as it will trigger a few compilations
(dev) $ pip install -I ./crankshaft
Then we can proceed to testing as in Alternative A.

View File

@@ -11,7 +11,7 @@ import plpy
# High level interface ---------------------------------------
def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
def moran_local(subquery, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I implementation for PL/Python
Andy Eschbacher
@@ -27,7 +27,7 @@ def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_
qvals = {"id_col": id_col,
"attr1": attr,
"geom_col": geom_column,
"table": t,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
@@ -54,7 +54,7 @@ def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_
return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
def moran_local_rate(subquery, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
"""
Moran's I Local Rate
Andy Eschbacher
@@ -69,7 +69,7 @@ def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permuta
"numerator": numerator,
"denominator": denominator,
"geom_col": geom_column,
"table": t,
"subquery": subquery,
"num_ngbrs": num_ngbrs}
q = get_query(w_type, qvals)
@@ -171,7 +171,7 @@ def query_attr_select(params):
"""
attrs = [k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs', 'subquery')]
template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
@@ -187,7 +187,7 @@ def query_attr_where(params):
Create portion of WHERE clauses for weeding out NULL-valued geometries
"""
attrs = sorted([k for k in params
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs', 'subquery')])
attr_string = []
@@ -217,12 +217,12 @@ def knn(params):
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"FROM \"({subquery})\" As j " \
"WHERE %(attr_where_j)s " \
"ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
"LIMIT {num_ngbrs} OFFSET 1 ) " \
") As neighbors " \
"FROM \"{table}\" As i " \
"FROM \"({subquery})\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements
@@ -245,11 +245,11 @@ def queen(params):
"i.\"{id_col}\" As id, " \
"%(attr_select)s" \
"(SELECT ARRAY(SELECT j.\"{id_col}\" " \
"FROM \"{table}\" As j " \
"FROM \"({subquery})\" As j " \
"WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
"%(attr_where_j)s)" \
") As neighbors " \
"FROM \"{table}\" As i " \
"FROM \"({subquery})\" As i " \
"WHERE " \
"%(attr_where_i)s " \
"ORDER BY i.\"{id_col}\" ASC;" % replacements

View File

@@ -10,7 +10,7 @@ from setuptools import setup, find_packages
setup(
name='crankshaft',
version='0.0.1',
version='0.0.0',
description='CartoDB Spatial Analysis Python Library',

View File

@@ -23,7 +23,7 @@ class MoranTest(unittest.TestCase):
self.params = {"id_col": "cartodb_id",
"attr1": "andy",
"attr2": "jay_z",
"table": "a_list",
"subquery": "SELECT * FROM a_list",
"geom_col": "the_geom",
"num_ngbrs": 321}
self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
@@ -60,10 +60,10 @@ class MoranTest(unittest.TestCase):
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
"FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"FROM \"(SELECT * FROM a_list)\" As j WHERE j.\"andy\" IS NOT NULL AND " \
"j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
"j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
"As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
"As neighbors FROM \"(SELECT * FROM a_list)\" As i WHERE i.\"andy\" IS NOT " \
"NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
"BY i.\"cartodb_id\" ASC;"
@@ -74,10 +74,10 @@ class MoranTest(unittest.TestCase):
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
"j.\"cartodb_id\" FROM \"(SELECT * FROM a_list)\" As j WHERE ST_Touches(" \
"i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
"AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
"neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
"neighbors FROM \"(SELECT * FROM a_list)\" As i WHERE i.\"andy\" IS NOT NULL " \
"AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
"i.\"cartodb_id\" ASC;"
@@ -88,10 +88,10 @@ class MoranTest(unittest.TestCase):
ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
"i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
"j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
"j.\"cartodb_id\" FROM \"(SELECT * FROM a_list)\" As j WHERE j.\"andy\" IS " \
"NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
"ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
"OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
"OFFSET 1 ) ) As neighbors FROM \"(SELECT * FROM a_list)\" As i WHERE " \
"i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
"i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"