performance imporvments

adding sklearn to deps
fixing syntax
2016-05-27 19:31:37 +00:00 · 2016-05-27 14:59:24 +00:00 · 2016-05-27 14:58:43 +00:00 · 2016-05-27 14:58:05 +00:00 · 2016-05-27 10:33:00 -04:00 · 2016-05-27 10:29:47 -04:00
88 changed files with 2771 additions and 369 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,3 @@
+envs/
+*.pyc
+.DS_Store
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,84 +1,94 @@
-# Contributing guide
+# Development process

-## How to add new functions
+Please read the Working Process/Quickstart Guide in [README.md](https://github.com/CartoDB/crankshaft/blob/master/README.md) first.

-Try to put as little logic in the SQL extension as possible and
-just use it as a wrapper to the Python module functionality.
+For any modification of crankshaft, such as adding new features,
+refactoring or bug-fixing, topic branch must be created out of the `develop`
+branch and be used for the development process.

-Once a function is defined it should never change its signature in subsequent
-versions. To change a function's signature a new function with a different
-name must be created.
+Modifications are done inside `src/pg/sql` and `src/py/crankshaft`.

-### Version numbers
+Take into account:

-The version of both the SQL extension and the Python package shall
-follow the [Semantic Versioning 2.0](http://semver.org/) guidelines:
+*  Tests must be added for any new functionality
+   (inside `src/pg/test`, `src/py/crankshaft/test`) as well as to
+   detect any bugs that are being fixed.
+*  Add or modify the corresponding documentation files in the `doc` folder.
+   Since we expect to have highly technical functions here, an extense
+   background explanation would be of great help to users of this extension.
+*  Convention: snake case(i.e. `snake_case` and not `CamelCase`)
+   shall be used for all function names.
+   Prefix function names intended for public use with `cdb_`
+   and private functions (to be used only internally inside
+   the extension)  with `_cdb_`.

-* When backwards incompatibility is introduced the major number is incremented
-* When functionally is added (in a backwards-compatible manner) the minor number
-  is incremented
-* When only fixes are introduced (backwards-compatible) the patch number is
-  incremented
+Once the code is ready to be tested, update the local development installation
+with `sudo make install`.
+This will update the 'dev' version of the extension in `src/pg/` and
+make it available to PostgreSQL.
+It will also install the python package (crankshaft) in a virtual
+environment `env/dev`.

-### Python Package
+The version number of the Python package, defined in
+`src/pg/crankshaft/setup.py` will be overridden when
+the package is released and always match the extension version number,
+but for development it shall be kept as '0.0.0'.

-...
+Run the tests with `make test`.

-### SQL Extension
-
-* Generate a **new subfolder version** for `sql` and `test` folders to define
-  the new functions and tests
-  - Use symlinks to avoid file duplication between versions that don't update them
-  - Add new files or modify copies of the old files to add new functions or
-    modify existing functions (remember to rename a function if the signature
-    changes)
-  - Add or modify the corresponding documentation files in the `doc` folder.
-    Since we expect to have highly technical functions here, an extense
-    background explanation would be of great help to users of this extension.
-  - Create tests for the new functions/behaviour
-
-* Generate the **upgrade and downgrade files** for the extension
-
-* Update the control file and the Makefile to generate the complete SQL
-  file for the new created version. After running `make` a new
-  file `crankshaft--X.Y.Z.sql` will be created for the current version.
-  Additional files for migrating to/from the previous version A.B.Z should be
-  created:
-  - `crankshaft--X.Y.Z--A.B.C.sql`
-  - `crankshaft--A.B.C--X.Y.Z.sql`
-  All these new files must be added to git and pushed.
-
-* Update the public docs! ;-)
-
-## Conventions
-
-# SQL
-
-Use snake case (i.e. `snake_case` and not `CamelCase`) for all
-functions. Prefix functions intended for public use with `cdb_`
-and private functions (to be used only internally inside
-the extension)  with `_cdb_`.
-
-# Python
-
-...
-
-## Testing
-
-Running just the Python tests:
+To use the python extension for custom tests, activate the virtual
+environment with:

 ```
-(cd python && make test)
+source envs/dev/bin/activate
 ```

-Installing the Extension and running just the PostgreSQL tests:
+Update extension in a working database with:
+
+* `ALTER EXTENSION crankshaft UPDATE TO 'current';`
+  `ALTER EXTENSION crankshaft UPDATE TO 'dev';`
+
+Note: we keep the current development version install as 'dev' always;
+we update through the 'current' alias to allow changing the extension
+contents but not the version identifier. This will fail if the
+changes involve incompatible function changes such as a different
+return type; in that case the offending function (or the whole extension)
+should be dropped manually before the update.
+
+If the extension has not previously been installed in a database,
+it can be installed directly with:
+
+* `CREATE EXTENSION IF NOT EXISTS plpythonu;`
+  `CREATE EXTENSION IF NOT EXISTS postgis;`
+  `CREATE EXTENSION IF NOT EXISTS cartodb;`
+  `CREATE EXTENSION crankshaft WITH VERSION 'dev';`
+
+Note: the development extension uses the development python virtual
+environment automatically.
+
+Before proceeding to the release process peer code reviewing of the code is
+a must.
+
+Once the feature or bugfix is completed and all the tests are passing
+a Pull-Request shall be created on the topic branch, reviewed by a peer
+and then merged back into the `develop` branch when all CI tests pass.
+
+When the changes in the `develop` branch are to be released in a new
+version of the extension, a PR must be created on the `develop` branch.
+
+The release manage will take hold of the PR at this moment to proceed
+to the release process for a new revision of the extension.
+
+## Relevant development tasks available in the Makefile

 ```
-(cd pg && sudo make install && PGUSER=postgres make installcheck)
-```
+* `make help` show a short description of the available targets

-Installing and testing everything:
+* `sudo make install` will generate the extension scripts for the development
+  version ('dev'/'current') and install the python package into the
+  development virtual environment `envs/dev`.
+  Intended for use by developers.

-```
-sudo make install && PGUSER=postgres make testinstalled
+* `make test` will run the tests for the installed development extension.
+  Intended for use by developers.
 ```
--- a/DEPLOYING.md
+++ b/DEPLOYING.md
@@ -1,43 +0,0 @@
-# Workflow
-
-... (branching/merging flow)
-
-# Deployment
-
-...
-
-Deployment to db servers: the next command will install both the Python
-package and the extension.
-
-```
-sudo make install
-```
-
-Installing only the Python package:
-
-```
-sudo pip install python/crankshaft --upgrade
-```
-
-Caveat: note that `pip install ./crankshaft` will install
-from local files, but `pip install crankshaft` will not.
-
-CI: Install and run the tests on the installed extension and package:
-
-```
-(sudo make install && PGUSER=postgres make testinstalled)
-```
-
-Installing the extension in user databases:
-Once installed in a server, the extension can be added
-to a database with the next SQL command:
-
-```
-CREATE EXTENSION crankshaft;
-```
-
-To upgrade the extension to an specific version X.Y.Z:
-
-```
-ALTER EXTENSION crankshaft UPGRADE TO 'X.Y.Z';
-```
--- a/69
+++ b/69
@@ -1,13 +1,70 @@
-EXT_DIR = pg
-PYP_DIR = python
+include ./Makefile.global
+
+EXT_DIR = src/pg
+PYP_DIR = src/py

 .PHONY: install
 .PHONY: run_tests
+.PHONY: release
+.PHONY: deploy

-install:
+# Generate and install developmet versions of the extension
+# and python package.
+# The extension is named 'dev' with a 'current' alias for easily upgrading.
+# The Python package is installed in a virtual environment envs/dev/
+# Requires sudo.
+install: ## Generate and install development version of the extension; requires sudo.
 	$(MAKE) -C $(PYP_DIR) install
 	$(MAKE) -C $(EXT_DIR) install

-testinstalled:
-	$(MAKE) -C $(PYP_DIR) testinstalled
-	$(MAKE) -C $(EXT_DIR) installcheck
+# Run the tests for the installed development extension and
+# python package
+test:   ## Run the tests for the development version of the extension
+	$(MAKE) -C $(PYP_DIR) test
+	$(MAKE) -C $(EXT_DIR) test
+
+# Generate a new release into release
+release: ## Generate a new release of the extension. Only for telease manager
+	$(MAKE) -C $(EXT_DIR) release
+	$(MAKE) -C $(PYP_DIR) release
+
+# Install the current release.
+# The Python package is installed in a virtual environment envs/X.Y.Z/
+# Requires sudo.
+# Use the RELEASE_VERSION environment variable to deploy a specific version:
+#     sudo make deploy RELEASE_VERSION=1.0.0
+deploy: ## Deploy a released extension. Only for release manager. Requires sudo.
+	$(MAKE) -C $(EXT_DIR) deploy
+	$(MAKE) -C $(PYP_DIR) deploy
+
+# Cleanup development extension script files
+clean-dev: ## clean up development extension script files
+	rm -f src/pg/$(EXTENSION)--*.sql
+
+# Cleanup all releases
+clean-releases: ## clean up all releases
+	rm -rf release/python/*
+	rm -f release/$(EXTENSION)--*.sql
+	rm -f release/$(EXTENSION).control
+
+# Cleanup current/specific version
+clean-release: ## clean up current release
+	rm -rf release/python/$(RELEASE_VERSION)
+	rm -f release/$(RELEASE_VERSION)--*.sql
+
+# Cleanup all virtual environments
+clean-environments: ## clean up all virtual environments
+	rm -rf envs/*
+
+clean-all: clean-dev clean-release clean-environments
+
+help:
+	@IFS=$$'\n' ; \
+	help_lines=(`fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//'`); \
+	for help_line in $${help_lines[@]}; do \
+		IFS=$$'#' ; \
+		help_split=($$help_line) ; \
+		help_command=`echo $${help_split[0]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
+		help_info=`echo $${help_split[2]} | sed -e 's/^ *//' -e 's/ *$$//'` ; \
+		printf "%-30s %s\n" $$help_command $$help_info ; \
+	done
--- a/Makefile.global
+++ b/Makefile.global
@@ -0,0 +1,6 @@
+SELF_DIR         := $(dir $(lastword $(MAKEFILE_LIST)))
+EXTENSION        = crankshaft
+PACKAGE          = crankshaft
+EXTVERSION       = $(shell grep default_version $(SELF_DIR)/src/pg/$(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
+RELEASE_VERSION ?= $(EXTVERSION)
+SED              = sed
--- a/NEWS.md
+++ b/NEWS.md
@@ -0,0 +1,7 @@
+0.0.2 (2016-03-16)
+------------------
+* New versioning approach using per-version Python virtual environments
+
+0.0.1 (2016-02-22)
+------------------
+* Preliminar release
--- a/README.md
+++ b/README.md
@@ -4,9 +4,68 @@ CartoDB Spatial Analysis extension for PostgreSQL.

 ## Code organization

-* *pg* contains the PostgreSQL extension source code
-* *python* Python module
+* *doc* documentation
+* *src* source code
+* - *src/pg* contains the PostgreSQL extension source code
+* - *src/py* Python module source code
+* *release* reseleased versions
+* *env* base directory for Python virtual environments

 ## Requirements

-* pip
+* pip, virtualenv, PostgreSQL
+* python-scipy system package (see [src/py/README.md](https://github.com/CartoDB/crankshaft/blob/master/src/py/README.md))
+
+# Working Process -- Quickstart Guide
+
+We distinguish two roles regarding the development cycle of crankshaft:
+
+* *developers* will implement new functionality and bugfixes into
+  the codebase and will request for new releases of the extension.
+* A *release manager* will attend these requests and will handle
+  the release process. The release process is sequential:
+  no concurrent releases will ever be in the works.
+
+We use the default `develop` branch as the basis for development.
+The `master` branch is used to merge and tag releases to be
+deployed in production.
+
+Developers shall create a new topic branch from `develop` for any new feature
+or bugfix and commit their changes to it and eventually merge back into
+the `develop` branch. When a new release is required a Pull Request
+will be open against the `develop` branch.
+
+The `develop` pull requests will be handled by the release manage,
+who will merge into master where new releases are prepared and tagged.
+The `master` branch is the sole responsibility of the release masters
+and developers must not commit or merge into it.
+
+## Development Guidelines
+
+For a detailed description of the development process please see
+the [CONTRIBUTING.md](https://github.com/CartoDB/crankshaft/blob/master/CONTRIBUTING.md) guide.
+
+Any modification to the source code (`src/pg/sql` for the SQL extension,
+`src/py/crankshaft` for the Python package) shall always be done
+in a topic branch created from the `develop` branch.
+
+Tests, documentation and peer code reviewing are required for all
+modifications.
+
+The tests (both for SQL and Python) are executed by running,
+from the top directory:
+
+```
+sudo make install
+make test
+```
+
+To request a new release, which will be handled by them
+release manager, a Pull Request must be created in the `develop`
+branch.
+
+## Release
+
+The release and deployment process is described in the
+[RELEASE.md](https://github.com/CartoDB/crankshaft/blob/master/RELEASE.md) guide and it is the responsibility of the designated
+release manager.
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -0,0 +1,93 @@
+# Release & Deployment Process
+
+Please read the Working Process/Quickstart Guide in README.md
+and the Development guidelines in CONTRIBUTING.md.
+
+The release process of a new version of the extension
+shall be performed by the designated *Release Manager*.
+
+Note that we expect to gradually automate more of this process.
+
+Having checked PR to be released it shall be
+merged back into the `master` branch to prepare the new release.
+
+The version number in `pg/cranckshaft.control` must first be updated.
+To do so [Semantic Versioning 2.0](http://semver.org/) is in order.
+
+Thew `NEWS.md` will be updated.
+
+We now will explain the process for the case of backwards-compatible
+releases (updating the minor or patch version numbers).
+
+TODO: document the complex case of major releases.
+
+The next command must be executed to produce the main installation
+script for the new release, `release/cranckshaft--X.Y.Z.sql` and
+also to copy the python package to `release/python/X.Y.Z/crankshaft`.
+
+```
+make release
+```
+
+Then, the release manager shall produce upgrade and downgrade scripts
+to migrate to/from the previous release. In the case of minor/patch
+releases this simply consist in extracting the functions that have changed
+and placing them in the proper `release/cranckshaft--X.Y.Z--A.B.C.sql`
+file.
+
+The new release can be deployed for staging/smoke tests with this command:
+
+```
+sudo make deploy
+```
+
+This will copy the current 'X.Y.Z' released version of the extension to
+PostgreSQL. The corresponding Python extension will be installed in a
+virtual environment in `envs/X.Y.Z`.
+
+It can be activated with:
+
+```
+source envs/X.Y.Z/bin/activate
+```
+
+But note that this is needed only for using the package directly;
+the 'X.Y.Z' version of the extension will automatically use the
+python package from this virtual environment.
+
+The `sudo make deploy` operation can be also used for installing
+the new version after it has been released.
+
+To install a specific version 'X.Y.Z' different from the current one
+(which must be present in `releases/`) you can:
+
+```
+sudo make deploy RELEASE_VERSION=X.Y.Z
+```
+
+TODO: testing procedure for the new release.
+
+TODO: procedure for staging deployment.
+
+TODO: procedure for merging to master, tagging and deploying
+in production.
+
+## Relevant release & deployment tasks available in the Makefile
+
+```
+* `make help` show a short description of the available targets
+
+* `make release` will generate a new release (version number defined in
+  `src/pg/crankshaft.control`) into `release/`.
+  Intended for use by the release manager.
+
+* `sudo make deploy` will install the current release X.Y.Z from the
+  `release/` files into PostgreSQL and a Python virtual environment
+  `envs/X.Y.Z`.
+  Intended for use by the release manager and deployment jobs.
+
+* `sudo make deploy RELEASE_VERSION=X.Y.Z` will install specified version
+  previously generated in `release/`
+  into PostgreSQL and a Python virtual environment `envs/X.Y.Z`.
+  Intended for use by the release manager and deployment jobs.
+```
--- a/TODO.md
+++ b/TODO.md
@@ -1,9 +0,0 @@
-* [x] Support versioning
-* [x] Test use of `plpy` from python Package
-* [x] Add `pysal` etc. dependencies
-* [x] Define documentation practices (general, per extension/package?)
-* [x] Add initial function set (WIP)
-* Unify style of function comments
-* [x] Add integration tests
-* Make target to open a new version development (create symlinks, etc.)
-* [x] Should add cartodb ext. as a dependency?
--- a/doc/02_moran.md
+++ b/doc/02_moran.md
@@ -0,0 +1,169 @@
+## Name
+
+CDB_AreasOfInterest -- returns a table with a cluster/outlier classification, the significance of a classification, an autocorrelation statistic (Local Moran's I), and the geometry id for each geometry in the original dataset.
+
+## Synopsis
+
+```sql
+table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name)
+
+table(numeric moran_val, text quadrant, numeric significance, int ids, numeric column_values) CDB_AreasOfInterest(text query, text column_name, int permutations, text geom_column, text id_column, text weight_type, int num_ngbrs)
+```
+
+## Description
+
+CDB_AreasOfInterest is a table-returning function that classifies the geometries in a table by an attribute and gives a significance for that classification. This information can be used to find "Areas of Interest" by using the correlation of a geometry's attribute with that of its neighbors. Areas can be clusters, outliers, or neither (depending on which significance value is used).
+
+Inputs:
+
+* `query` (required): an arbitrary query against tables you have access to (e.g., in your account, shared in your organization, or through the Data Observatory). This string must contain the following columns: an id `INT` (e.g., `cartodb_id`), geometry (e.g., `the_geom`), and the numeric attribute which is specified in `column_name`
+* `column_name` (required): column to perform the area of interest analysis tool on. The data must be numeric (e.g., `float`, `int`, etc.)
+* `permutations` (optional): used to calculate the significance of a classification. Defaults to 99, which is sufficient in most situations.
+* `geom_column` (optional): the name of the geometry column. Data must be of type `geometry`.
+* `id_column` (optional): the name of the id column (e.g., `cartodb_id`). Data must be of type `int` or `bigint` and have a unique condition on the data.
+* `weight_type` (optional): the type of weight used for determining what defines a neighborhood. Options are `knn` or `queen`.
+* `num_ngbrs` (optional): the number of neighbors in a neighborhood around a geometry. Only used if `knn` is chosen above.
+
+Outputs:
+
+* `moran_val`: underlying correlation statistic used in analysis
+* `quadrant`: human-readable interpretation of classification
+* `significance`: significance of classification (closer to 0 is more significant)
+* `ids`: id of original geometry (used for joining against original table if desired -- see examples)
+* `column_values`: original column values from `column_name`
+
+Availability: crankshaft v0.0.1 and above
+
+## Examples
+
+```sql
+SELECT
+  t.the_geom_webmercator,
+  t.cartodb_id,
+  aoi.significance,
+  aoi.quadrant As aoi_quadrant
+FROM
+  observatory.acs2013 As t
+JOIN
+  crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013',
+                                 'gini_index')
+```
+
+## API Usage
+
+Example
+
+```text
+http://eschbacher.cartodb.com/api/v2/sql?q=SELECT * FROM crankshaft.CDB_AreasOfInterest('SELECT * FROM observatory.acs2013','gini_index')
+```
+
+Result
+```json
+{
+  time: 0.120,
+  total_rows: 100,
+  rows: [{
+    moran_vals: 0.7213,
+    quadrant: 'High area',
+    significance: 0.03,
+    ids: 1,
+    column_value: 0.22
+  },
+  {
+    moran_vals: -0.7213,
+    quadrant: 'Low outlier',
+    significance: 0.13,
+    ids: 2,
+    column_value: 0.03
+  },
+  ...
+  ]
+}
+```
+
+## See Also
+
+crankshaft's areas of interest functions:
+
+* [CDB_AreasOfInterest_Global]()
+* [CDB_AreasOfInterest_Rate_Local]()
+* [CDB_AreasOfInterest_Rate_Global]()
+
+
+PostGIS clustering functions:
+
+* [ST_ClusterIntersecting](http://postgis.net/docs/manual-2.2/ST_ClusterIntersecting.html)
+* [ST_ClusterWithin](http://postgis.net/docs/manual-2.2/ST_ClusterWithin.html)
+
+
+-- removing below, working into above
+
+#### What is Moran's I and why is it significant for CartoDB?
+
+Moran's I is a geostatistical calculation which gives a measure of the global
+clustering and presence of outliers within the geographies in a map. Here global
+means over all of the geographies in a dataset. Imagine mapping the incidence
+rates of cancer in neighborhoods of a city. If there were areas covering several
+neighborhoods with abnormally low rates of cancer, those areas are positively
+spatially correlated with one another and would be considered a cluster. If
+there was a single neighborhood with a high rate but with all neighbors on
+average having a low rate, it would be considered a spatial outlier.
+
+While Moran's I gives a global snapshot, there are local indicators for
+clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
+process related to autocorrelation -- i.e., a process that compares a
+geography's attribute to the attribute in neighbor geographies.
+
+For the example of cancer rates in neighborhoods, since these neighborhoods have
+a high value for rate of cancer, and all of their neighbors do as well, they are
+designated as "High High" or simply **HH**. For areas with multiple neighborhoods
+with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
+naturally fit into the concept of clustering and are in the correlated
+variables.
+
+"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
+where a geog has a high value and it's neighbors, on average, have a low value
+(or vice versa). An example of this is a "gated community" or placement of a
+city housing project in a rich region. These deliberate developments have
+opposite median income as compared to the neighbors around them. They have a
+high (or low) value while their neighbors have a low (or high) value. They exist
+typically as islands, and in rare circumstances can extend as chains dividing
+**LL** or **HH**.
+
+Strong policies such as rent stabilization (probably) tend to prevent the
+clustering of high rent areas as they integrate middle class incomes. Luxury
+apartment buildings, which are a kind of gated community, probably tend to skew
+an area's median income upwards while housing projects have the opposite effect.
+What are the nuggets in the analysis?
+
+Two functions are available to compute Moran I statistics:
+
+* `cdb_moran_local` computes Moran I measures, quad classification and
+  significance values from numerial values associated to geometry entities
+  in an input table. The geometries should be contiguous polygons When
+  then `queen` `w_type` is used.
+* `cdb_moran_local_rate` computes the same statistics using a ratio between
+  numerator and denominator columns of a table.
+
+The parameters for `cdb_moran_local` are:
+
+* `table` name of the table that contains the data values
+* `attr` name of the column
+* `signficance` significance threshold for the quads values
+* `num_ngbrs` number of neighbors to consider (default: 5)
+* `permutations` number of random permutations for calculation of
+  pseudo-p values (default: 99)
+* `geom_column` number of the geometry column (default: "the_geom")
+* `id_col` PK column of the table (default: "cartodb_id")
+* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
+  or "queen" for contiguity based weights.
+
+The function returns a table with the following columns:
+
+* `moran` Moran's value
+* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
+* `significance` significance value
+* `ids` id of the corresponding record in the input table
+
+Function `cdb_moran_local_rate` only differs in that the `attr` input
+parameter is substituted by `numerator` and `denominator`.
--- a/pg/doc/03_overlap_sum.md
+++ b/pg/doc/03_overlap_sum.md
--- a/doc/docs_template.md
+++ b/doc/docs_template.md
@@ -0,0 +1,24 @@
+
+## Name
+
+## Synopsis
+
+## Description
+
+Availability: v...
+
+## Examples
+
+```SQL
+-- example of the function in use
+SELECT cdb_awesome_function(the_geom, 'total_pop')
+FROM table_name
+```
+
+## API Usage
+
+_asdf_
+
+## See Also
+
+_Other function pages_
--- a/pg/.gitignore
+++ b/pg/.gitignore
@@ -1,3 +0,0 @@
-regression.diffs
-regression.out
-results/
--- a/pg/Makefile
+++ b/pg/Makefile
@@ -1,33 +0,0 @@
-# Makefile to generate the extension out of separate sql source files.
-# Once a version is released, it is not meant to be changed. E.g: once version 0.0.1 is out, it SHALL NOT be changed.
-
-EXTENSION    = crankshaft
-EXTVERSION   = $(shell grep default_version $(EXTENSION).control | sed -e "s/default_version[[:space:]]*=[[:space:]]*'\([^']*\)'/\1/")
-
-# The new version to be generated from templates
-NEW_EXTENSION_ARTIFACT = $(EXTENSION)--$(EXTVERSION).sql
-
-# DATA is a special variable used by postgres build infrastructure
-# These are the files to be installed in the server shared dir,
-# for installation from scratch, upgrades and downgrades.
-# @see http://www.postgresql.org/docs/current/static/extend-pgxs.html
-DATA =  $(NEW_EXTENSION_ARTIFACT)
-
-SOURCES_DATA_DIR = sql/$(EXTVERSION)
-SOURCES_DATA = $(wildcard sql/$(EXTVERSION)/*.sql)
-
-# The extension installation artifacts are stored in the base subdirectory
-$(NEW_EXTENSION_ARTIFACT): $(SOURCES_DATA)
-	rm -f $@
-	cat $(SOURCES_DATA_DIR)/*.sql >> $@
-
-REGRESS = $(notdir $(basename $(wildcard test/$(EXTVERSION)/sql/*test.sql)))
-TEST_DIR = test/$(EXTVERSION)
-REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
-
-PG_CONFIG = pg_config
-PGXS := $(shell $(PG_CONFIG) --pgxs)
-include $(PGXS)
-
-# This seems to be needed at least for PG 9.3.11
-all: $(DATA)
--- a/pg/README.md
+++ b/pg/README.md
@@ -1,7 +0,0 @@
-
-# Running the tests:
-
-```
-sudo make install
-PGUSER=postgres make installcheck
-```
--- a/pg/doc/02_moran.md
+++ b/pg/doc/02_moran.md
@@ -1,71 +0,0 @@
-### Moran's I
-
-#### What is Moran's I and why is it significant for CartoDB?
-
-Moran's I is a geostatistical calculation which gives a measure of the global
-clustering and presence of outliers within the geographies in a map. Here global
-means over all of the geographies in a dataset. Imagine mapping the incidence
-rates of cancer in neighborhoods of a city. If there were areas covering several
-neighborhoods with abnormally low rates of cancer, those areas are positively
-spatially correlated with one another and would be considered a cluster. If
-there was a single neighborhood with a high rate but with all neighbors on
-average having a low rate, it would be considered a spatial outlier.
-
-While Moran's I gives a global snapshot, there are local indicators for
-clustering called Local Indicators of Spatial Autocorrelation. Clustering is a
-process related to autocorrelation -- i.e., a process that compares a
-geography's attribute to the attribute in neighbor geographies.
-
-For the example of cancer rates in neighborhoods, since these neighborhoods have
-a high value for rate of cancer, and all of their neighbors do as well, they are
-designated as "High High" or simply **HH**. For areas with multiple neighborhoods
-with low rates of cancer, they are designated as "Low Low" or **LL**. HH and LL
-naturally fit into the concept of clustering and are in the correlated
-variables.
-
-"Anticorrelated" geogs are in **LH** and **HL** regions -- that is, regions
-where a geog has a high value and it's neighbors, on average, have a low value
-(or vice versa). An example of this is a "gated community" or placement of a
-city housing project in a rich region. These deliberate developments have
-opposite median income as compared to the neighbors around them. They have a
-high (or low) value while their neighbors have a low (or high) value. They exist
-typically as islands, and in rare circumstances can extend as chains dividing
-**LL** or **HH**.
-
-Strong policies such as rent stabilization (probably) tend to prevent the
-clustering of high rent areas as they integrate middle class incomes. Luxury
-apartment buildings, which are a kind of gated community, probably tend to skew
-an area's median income upwards while housing projects have the opposite effect.
-What are the nuggets in the analysis?
-
-Two functions are available to compute Moran I statistics:
-
-* `cdb_moran_local` computes Moran I measures, quad classification and
-  significance values from numerial values associated to geometry entities
-  in an input table. The geometries should be contiguous polygons When
-  then `queen` `w_type` is used.
-* `cdb_moran_local_rate` computes the same statistics using a ratio between
-  numerator and denominator columns of a table.
-
-The parameters for `cdb_moran_local` are:
-
-* `table` name of the table that contains the data values
-* `attr` name of the column
-* `signficance` significance threshold for the quads values
-* `num_ngbrs` number of neighbors to consider (default: 5)
-* `permutations` number of random permutations for calculation of
-  pseudo-p values (default: 99)
-* `geom_column` number of the geometry column (default: "the_geom")
-* `id_col` PK column of the table (default: "cartodb_id")
-* `w_type` Weight types: can be "knn" for k-nearest neighbor weights
-  or "queen" for contiguity based weights.
-
-The function returns a table with the following columns:
-
-* `moran` Moran's value
-* `quads` quad classification ('HH', 'LL', 'HL', 'LH' or 'Not significant')
-* `significance` significance value
-* `ids` id of the corresponding record in the input table
-
-Function `cdb_moran_local_rate` only differs in that the `attr` input
-parameter is substituted by `numerator` and `denominator`.
--- a/pg/test/0.0.1/results/01_install_test.out
+++ b/pg/test/0.0.1/results/01_install_test.out
@@ -1,6 +0,0 @@
-- Install dependencies
-CREATE EXTENSION plpythonu;
-CREATE EXTENSION postgis;
-CREATE EXTENSION cartodb;
-- Install the extension
-CREATE EXTENSION crankshaft;
--- a/python/.gitignore
+++ b/python/.gitignore
@@ -1 +0,0 @@
-*.pyc
--- a/python/Makefile
+++ b/python/Makefile
@@ -1,11 +0,0 @@
-# Install the package (needs root privileges)
-install:
-	pip install ./crankshaft --upgrade
-
-# Test from source code
-test:
-	(cd crankshaft && nosetests test/)
-
-# Test currently installed package
-testinstalled:
-	nosetests crankshaft/test/
--- a/python/README.md
+++ b/python/README.md
@@ -1,9 +0,0 @@
-# Crankshaft Python Package
-
-...
-### Run the tests
-
-```bash
-cd crankshaft
-nosetests test/
-```
--- a/release/.gitignore
+++ b/release/.gitignore
--- a/release/crankshaft--0.0.1--0.0.2.sql
+++ b/release/crankshaft--0.0.1--0.0.2.sql
@@ -0,0 +1,74 @@
+CREATE OR REPLACE FUNCTION cdb_crankshaft.cdb_crankshaft_version()
+RETURNS text AS $$
+  SELECT '0.0.2'::text;
+$$ language 'sql' STABLE STRICT;
+
+CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_internal_version()
+RETURNS text AS $$
+  SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
+$$ language 'sql' STABLE STRICT;
+CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_virtualenvs_path()
+RETURNS text
+AS $$
+  BEGIN
+    RETURN '/home/ubuntu/crankshaft/envs';
+  END;
+$$ language plpgsql IMMUTABLE STRICT;
+
+CREATE OR REPLACE FUNCTION cdb_crankshaft._cdb_crankshaft_activate_py()
+RETURNS VOID
+AS $$
+    import os
+    # plpy.notice('%',str(os.environ))
+    # activate virtualenv
+    crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
+    base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
+    default_venv_path = os.path.join(base_path, crankshaft_version)
+    venv_path =  os.environ.get('CRANKSHAFT_VENV', default_venv_path)
+    activate_path = venv_path + '/bin/activate_this.py'
+    exec(open(activate_path).read(), dict(__file__=activate_path))
+$$ LANGUAGE plpythonu;
+
+CREATE OR REPLACE FUNCTION
+cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft import random_seeds
+  random_seeds.set_random_seeds(seed_value)
+$$ LANGUAGE plpythonu;
+-- Moran's I
+CREATE OR REPLACE FUNCTION
+cdb_crankshaft.cdb_moran_local (
+      t TEXT,
+  	  attr TEXT,
+  	  significance float DEFAULT 0.05,
+  	  num_ngbrs INT DEFAULT 5,
+  	  permutations INT DEFAULT 99,
+  	  geom_column TEXT DEFAULT 'the_geom',
+  	  id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn')
+RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.clustering import moran_local
+  # TODO: use named parameters or a dictionary
+  return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
+$$ LANGUAGE plpythonu;
+
+CREATE OR REPLACE FUNCTION
+cdb_crankshaft.cdb_moran_local_rate(t TEXT,
+		 numerator TEXT,
+		 denominator TEXT,
+		 significance FLOAT DEFAULT 0.05,
+		 num_ngbrs INT DEFAULT 5,
+		 permutations INT DEFAULT 99,
+		 geom_column TEXT DEFAULT 'the_geom',
+		 id_col TEXT DEFAULT 'cartodb_id',
+		 w_type TEXT DEFAULT 'knn')
+RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.clustering import moran_local_rate
+  # TODO: use named parameters or a dictionary
+  return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
+$$ LANGUAGE plpythonu;
--- a/release/crankshaft--0.0.1.sql
+++ b/release/crankshaft--0.0.1.sql
--- a/release/crankshaft--0.0.2--0.0.1.sql
+++ b/release/crankshaft--0.0.2--0.0.1.sql
@@ -1,6 +1,12 @@
-- Moran's I
 CREATE OR REPLACE FUNCTION
-  cdb_moran_local (
+cdb_crankshaft._cdb_random_seeds (seed_value INTEGER) RETURNS VOID
+AS $$
+  from crankshaft import random_seeds
+  random_seeds.set_random_seeds(seed_value)
+$$ LANGUAGE plpythonu;
+
+CREATE OR REPLACE FUNCTION
+cdb_crankshaft.cdb_moran_local (
      t TEXT,
  	  attr TEXT,
  	  significance float DEFAULT 0.05,
@@ -16,9 +22,8 @@ AS $$
  return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
 $$ LANGUAGE plpythonu;

-- Moran's I Local Rate
 CREATE OR REPLACE FUNCTION
-  cdb_moran_local_rate(t TEXT,
+cdb_crankshaft.cdb_moran_local_rate(t TEXT,
 		 numerator TEXT,
 		 denominator TEXT,
 		 significance FLOAT DEFAULT 0.05,
@@ -33,3 +38,7 @@ AS $$
  # TODO: use named parameters or a dictionary
  return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
 $$ LANGUAGE plpythonu;
+
+DROP FUNCTION IF EXISTS cdb_crankshaft.cdb_crankshaft_version();
+DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_internal_version();
+DROP FUNCTION IF EXISTS cdb_crankshaft._cdb_crankshaft_activate_py();
--- a/release/crankshaft--0.0.2.sql
+++ b/release/crankshaft--0.0.2.sql
@@ -0,0 +1,186 @@
+--DO NOT MODIFY THIS FILE, IT IS GENERATED AUTOMATICALLY FROM SOURCES
+-- Complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION crankshaft" to load this file. \quit
+-- Version number of the extension release
+CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
+RETURNS text AS $$
+  SELECT '0.0.2'::text;
+$$ language 'sql' STABLE STRICT;
+
+-- Internal identifier of the installed extension instence
+-- e.g. 'dev' for current development version
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
+RETURNS text AS $$
+  SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
+$$ language 'sql' STABLE STRICT;
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
+RETURNS text
+AS $$
+  BEGIN
+    -- RETURN '/opt/virtualenvs/crankshaft';
+    RETURN '/home/ubuntu/crankshaft/envs';
+  END;
+$$ language plpgsql IMMUTABLE STRICT;
+
+-- Use the crankshaft python module
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
+RETURNS VOID
+AS $$
+    import os
+    # plpy.notice('%',str(os.environ))
+    # activate virtualenv
+    crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
+    base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
+    default_venv_path = os.path.join(base_path, crankshaft_version)
+    venv_path =  os.environ.get('CRANKSHAFT_VENV', default_venv_path)
+    activate_path = venv_path + '/bin/activate_this.py'
+    exec(open(activate_path).read(), dict(__file__=activate_path))
+$$ LANGUAGE plpythonu;
+-- Internal function.
+-- Set the seeds of the RNGs (Random Number Generators)
+-- used internally.
+CREATE OR REPLACE FUNCTION
+_cdb_random_seeds (seed_value INTEGER) RETURNS VOID
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft import random_seeds
+  random_seeds.set_random_seeds(seed_value)
+$$ LANGUAGE plpythonu;
+-- Moran's I
+CREATE OR REPLACE FUNCTION
+  cdb_moran_local (
+      t TEXT,
+  	  attr TEXT,
+  	  significance float DEFAULT 0.05,
+  	  num_ngbrs INT DEFAULT 5,
+  	  permutations INT DEFAULT 99,
+  	  geom_column TEXT DEFAULT 'the_geom',
+  	  id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn')
+RETURNS TABLE (moran FLOAT, quads TEXT, significance FLOAT, ids INT)
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.clustering import moran_local
+  # TODO: use named parameters or a dictionary
+  return moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
+$$ LANGUAGE plpythonu;
+
+-- Moran's I Local Rate
+CREATE OR REPLACE FUNCTION
+  cdb_moran_local_rate(t TEXT,
+		 numerator TEXT,
+		 denominator TEXT,
+		 significance FLOAT DEFAULT 0.05,
+		 num_ngbrs INT DEFAULT 5,
+		 permutations INT DEFAULT 99,
+		 geom_column TEXT DEFAULT 'the_geom',
+		 id_col TEXT DEFAULT 'cartodb_id',
+		 w_type TEXT DEFAULT 'knn')
+RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
+AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.clustering import moran_local_rate
+  # TODO: use named parameters or a dictionary
+  return moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type)
+$$ LANGUAGE plpythonu;
+-- Function by Stuart Lynn for a simple interpolation of a value
+-- from a polygon table over an arbitrary polygon
+-- (weighted by the area proportion overlapped)
+-- Aereal weighting is a very simple form of aereal interpolation.
+--
+-- Parameters:
+--   * geom a Polygon geometry which defines the area where a value will be
+--     estimated as the area-weighted sum of a given table/column
+--   * target_table_name table name of the table that provides the values
+--   * target_column column name of the column that provides the values
+--   * schema_name optional parameter to defina the schema the target table
+--     belongs to, which is necessary if its not in the search_path.
+--     Note that target_table_name should never include the schema in it.
+-- Return value:
+--   Aereal-weighted interpolation of the column values over the geometry
+CREATE OR REPLACE
+FUNCTION cdb_overlap_sum(geom geometry, target_table_name text, target_column text, schema_name text DEFAULT NULL)
+  RETURNS numeric AS
+$$
+DECLARE
+	result numeric;
+  qualified_name text;
+BEGIN
+  IF schema_name IS NULL THEN
+    qualified_name := Format('%I', target_table_name);
+  ELSE
+    qualified_name := Format('%I.%s', schema_name, target_table_name);
+  END IF;
+  EXECUTE Format('
+    SELECT sum(%I*ST_Area(St_Intersection($1, a.the_geom))/ST_Area(a.the_geom))
+    FROM %s AS a
+    WHERE $1 && a.the_geom
+  ', target_column, qualified_name)
+  USING geom
+  INTO result;
+  RETURN result;
+END;
+$$ LANGUAGE plpgsql;
+--
+-- Creates N points randomly distributed arround the polygon
+--
+-- @param g - the geometry to be turned in to points
+--
+-- @param no_points - the number of points to generate
+--
+-- @params max_iter_per_point - the function generates points in the polygon's bounding box
+-- and discards points which don't lie in the polygon. max_iter_per_point specifies how many
+-- misses per point the funciton accepts before giving up.
+--
+-- Returns: Multipoint with the requested points
+CREATE OR REPLACE FUNCTION cdb_dot_density(geom geometry , no_points Integer, max_iter_per_point Integer DEFAULT 1000)
+RETURNS GEOMETRY AS $$
+DECLARE
+  extent GEOMETRY;
+  test_point Geometry;
+  width                NUMERIC;
+  height               NUMERIC;
+  x0                   NUMERIC;
+  y0                   NUMERIC;
+  xp                   NUMERIC;
+  yp                   NUMERIC;
+  no_left              INTEGER;
+  remaining_iterations INTEGER;
+  points               GEOMETRY[];
+  bbox_line            GEOMETRY;
+  intersection_line    GEOMETRY;
+BEGIN
+  extent  := ST_Envelope(geom);
+  width   := ST_XMax(extent) - ST_XMIN(extent);
+  height  := ST_YMax(extent) - ST_YMIN(extent);
+  x0 	  := ST_XMin(extent);
+  y0 	  := ST_YMin(extent);
+  no_left := no_points;
+
+  LOOP
+    if(no_left=0) THEN
+      EXIT;
+    END IF;
+    yp = y0 + height*random();
+    bbox_line  = ST_MakeLine(
+      ST_SetSRID(ST_MakePoint(yp, x0),4326),
+      ST_SetSRID(ST_MakePoint(yp, x0+width),4326)
+    );
+    intersection_line = ST_Intersection(bbox_line,geom);
+  	test_point = ST_LineInterpolatePoint(st_makeline(st_linemerge(intersection_line)),random());
+	  points := points || test_point;
+	  no_left = no_left - 1 ;
+  END LOOP;
+  RETURN ST_Collect(points);
+END;
+$$
+LANGUAGE plpgsql VOLATILE;
+-- Make sure by default there are no permissions for publicuser
+-- NOTE: this happens at extension creation time, as part of an implicit transaction.
+-- REVOKE ALL PRIVILEGES ON SCHEMA cdb_crankshaft FROM PUBLIC, publicuser CASCADE;
+
+-- Grant permissions on the schema to publicuser (but just the schema)
+GRANT USAGE ON SCHEMA cdb_crankshaft TO publicuser;
+
+-- Revoke execute permissions on all functions in the schema by default
+-- REVOKE EXECUTE ON ALL FUNCTIONS IN SCHEMA cdb_crankshaft FROM PUBLIC, publicuser;
--- a/release/crankshaft.control
+++ b/release/crankshaft.control
@@ -1,5 +1,5 @@
 comment = 'CartoDB Spatial Analysis extension'
-default_version = '0.0.1'
+default_version = '0.0.2'
 requires = 'plpythonu, postgis, cartodb'
 superuser = true
 schema = cdb_crankshaft
--- a/release/python/.gitignore
+++ b/release/python/.gitignore
--- a/release/python/0.0.1/crankshaft/crankshaft/init.py
+++ b/release/python/0.0.1/crankshaft/crankshaft/init.py
--- a/release/python/0.0.1/crankshaft/crankshaft/clustering/init.py
+++ b/release/python/0.0.1/crankshaft/crankshaft/clustering/init.py
--- a/release/python/0.0.1/crankshaft/crankshaft/clustering/moran.py
+++ b/release/python/0.0.1/crankshaft/crankshaft/clustering/moran.py
--- a/release/python/0.0.1/crankshaft/crankshaft/random_seeds.py
+++ b/release/python/0.0.1/crankshaft/crankshaft/random_seeds.py
--- a/release/python/0.0.1/crankshaft/setup.py
+++ b/release/python/0.0.1/crankshaft/setup.py
@@ -10,7 +10,7 @@ from setuptools import setup, find_packages
 setup(
    name='crankshaft',

-    version='0.0.1',
+    version='0.0.01',

    description='CartoDB Spatial Analysis Python Library',

--- a/release/python/0.0.1/crankshaft/test/fixtures/moran.json
+++ b/release/python/0.0.1/crankshaft/test/fixtures/moran.json
--- a/release/python/0.0.1/crankshaft/test/fixtures/neighbors.json
+++ b/release/python/0.0.1/crankshaft/test/fixtures/neighbors.json
--- a/release/python/0.0.1/crankshaft/test/helper.py
+++ b/release/python/0.0.1/crankshaft/test/helper.py
--- a/release/python/0.0.1/crankshaft/test/mock_plpy.py
+++ b/release/python/0.0.1/crankshaft/test/mock_plpy.py
--- a/release/python/0.0.1/crankshaft/test/test_clustering_moran.py
+++ b/release/python/0.0.1/crankshaft/test/test_clustering_moran.py
--- a/release/python/0.0.2/crankshaft/crankshaft/init.py
+++ b/release/python/0.0.2/crankshaft/crankshaft/init.py
@@ -0,0 +1,2 @@
+import random_seeds
+import clustering
--- a/release/python/0.0.2/crankshaft/crankshaft/clustering/init.py
+++ b/release/python/0.0.2/crankshaft/crankshaft/clustering/init.py
@@ -0,0 +1 @@
+from moran import *
--- a/release/python/0.0.2/crankshaft/crankshaft/clustering/moran.py
+++ b/release/python/0.0.2/crankshaft/crankshaft/clustering/moran.py
@@ -0,0 +1,321 @@
+"""
+Moran's I geostatistics (global clustering & outliers presence)
+"""
+
+# TODO: Fill in local neighbors which have null/NoneType values with the
+#       average of the their neighborhood
+
+import numpy as np
+import pysal as ps
+import plpy
+
+# High level interface ---------------------------------------
+
+def moran_local(t, attr, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
+    """
+    Moran's I implementation for PL/Python
+    Andy Eschbacher
+    """
+    # TODO: ensure that the significance output can be smaller that 1e-3 (0.001)
+    # TODO: make a wishlist of output features (zscores, pvalues, raw local lisa, what else?)
+
+    plpy.notice('** Constructing query')
+
+    # geometries with attributes that are null are ignored
+    # resulting in a collection of not as near neighbors
+
+    qvals = {"id_col": id_col,
+            "attr1": attr,
+            "geom_col": geom_column,
+             "table": t,
+             "num_ngbrs": num_ngbrs}
+
+    q = get_query(w_type, qvals)
+
+    try:
+        r = plpy.execute(q)
+        plpy.notice('** Query returned with %d rows' % len(r))
+    except plpy.SPIError:
+        plpy.notice('** Query failed: "%s"' % q)
+        plpy.notice('** Exiting function')
+        return zip([None], [None], [None], [None])
+
+    y = get_attributes(r, 1)
+    w = get_weight(r, w_type)
+
+    # calculate LISA values
+    lisa = ps.Moran_Local(y, w)
+
+    # find units of significance
+    lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
+
+    plpy.notice('** Finished calculations')
+
+    return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
+
+
+def moran_local_rate(t, numerator, denominator, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
+    """
+    Moran's I Local Rate
+    Andy Eschbacher
+    """
+
+    plpy.notice('** Constructing query')
+
+    # geometries with attributes that are null are ignored
+    # resulting in a collection of not as near neighbors
+
+    qvals = {"id_col": id_col,
+             "numerator": numerator,
+             "denominator": denominator,
+             "geom_col": geom_column,
+             "table": t,
+             "num_ngbrs": num_ngbrs}
+
+    q = get_query(w_type, qvals)
+
+    try:
+        r = plpy.execute(q)
+        plpy.notice('** Query returned with %d rows' % len(r))
+    except plpy.SPIError:
+        plpy.notice('** Query failed: "%s"' % q)
+        plpy.notice('** Error: %s' % plpy.SPIError)
+        plpy.notice('** Exiting function')
+        return zip([None], [None], [None], [None])
+
+        plpy.notice('r.nrows() = %d' % r.nrows())
+
+    ## collect attributes
+    numer = get_attributes(r, 1)
+    denom = get_attributes(r, 2)
+
+    w = get_weight(r, w_type, num_ngbrs)
+
+    # calculate LISA values
+    lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, w, permutations=permutations)
+
+    # find units of significance
+    lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
+
+    plpy.notice('** Finished calculations')
+
+    ## TODO: Decide on which return values here
+    return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order, lisa.y)
+
+def moran_local_bv(t, attr1, attr2, significance, num_ngbrs, permutations, geom_column, id_col, w_type):
+    plpy.notice('** Constructing query')
+
+    qvals = {"num_ngbrs": num_ngbrs,
+             "attr1": attr1,
+             "attr2": attr2,
+             "table": t,
+             "geom_col": geom_column,
+             "id_col": id_col}
+
+    q = get_query(w_type, qvals)
+
+    try:
+        r = plpy.execute(q)
+        plpy.notice('** Query returned with %d rows' % len(r))
+    except plpy.SPIError:
+        plpy.notice('** Query failed: "%s"' % q)
+        plpy.notice('** Error: %s' % plpy.SPIError)
+        plpy.notice('** Exiting function')
+        return zip([None], [None], [None], [None])
+
+    ## collect attributes
+    attr1_vals = get_attributes(r, 1)
+    attr2_vals = get_attributes(r, 2)
+
+    # create weights
+    w = get_weight(r, w_type, num_ngbrs)
+
+    # calculate LISA values
+    lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, w)
+
+    plpy.notice("len of Is: %d" % len(lisa.Is))
+
+    # find clustering of significance
+    lisa_sig = lisa_sig_vals(lisa.p_sim, lisa.q, significance)
+
+    plpy.notice('** Finished calculations')
+
+    return zip(lisa.Is, lisa_sig, lisa.p_sim, w.id_order)
+
+
+# Low level functions ----------------------------------------
+
+def map_quads(coord):
+    """
+        Map a quadrant number to Moran's I designation
+        HH=1, LH=2, LL=3, HL=4
+        Input:
+        :param coord (int): quadrant of a specific measurement
+    """
+    if coord == 1:
+        return 'HH'
+    elif coord == 2:
+        return 'LH'
+    elif coord == 3:
+        return 'LL'
+    elif coord == 4:
+        return 'HL'
+    else:
+        return None
+
+def query_attr_select(params):
+    """
+        Create portion of SELECT statement for attributes inolved in query.
+        :param params: dict of information used in query (column names,
+                       table name, etc.)
+    """
+
+    attrs = [k for k in params
+             if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')]
+
+    template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
+
+    attr_string = ""
+
+    for idx, val in enumerate(sorted(attrs)):
+        attr_string += template % {"col": val, "alias_num": idx + 1}
+
+    return attr_string
+
+def query_attr_where(params):
+    """
+        Create portion of WHERE clauses for weeding out NULL-valued geometries
+    """
+    attrs = sorted([k for k in params
+                    if k not in ('id_col', 'geom_col', 'table', 'num_ngbrs')])
+
+    attr_string = []
+
+    for attr in attrs:
+        attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
+
+    if len(attrs) == 2:
+        attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
+
+    out = " AND ".join(attr_string)
+
+    return out
+
+def knn(params):
+    """SQL query for k-nearest neighbors.
+        :param vars: dict of values to fill template
+    """
+
+    attr_select = query_attr_select(params)
+    attr_where = query_attr_where(params)
+
+    replacements = {"attr_select": attr_select,
+                    "attr_where_i": attr_where.replace("idx_replace", "i"),
+                    "attr_where_j": attr_where.replace("idx_replace", "j")}
+
+    query = "SELECT " \
+                "i.\"{id_col}\" As id, " \
+                "%(attr_select)s" \
+                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
+                              "FROM \"{table}\" As j " \
+                              "WHERE %(attr_where_j)s " \
+                              "ORDER BY j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
+                              "LIMIT {num_ngbrs} OFFSET 1 ) " \
+                ") As neighbors " \
+            "FROM \"{table}\" As i " \
+            "WHERE " \
+                "%(attr_where_i)s " \
+            "ORDER BY i.\"{id_col}\" ASC;" % replacements
+
+    return query.format(**params)
+
+## SQL query for finding queens neighbors (all contiguous polygons)
+def queen(params):
+    """SQL query for queen neighbors.
+        :param params: dict of information to fill query
+    """
+    attr_select = query_attr_select(params)
+    attr_where = query_attr_where(params)
+
+    replacements = {"attr_select": attr_select,
+                    "attr_where_i": attr_where.replace("idx_replace", "i"),
+                    "attr_where_j": attr_where.replace("idx_replace", "j")}
+
+    query = "SELECT " \
+                "i.\"{id_col}\" As id, " \
+                "%(attr_select)s" \
+                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
+                 "FROM \"{table}\" As j " \
+                 "WHERE ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
+                 "%(attr_where_j)s)" \
+                ") As neighbors " \
+            "FROM \"{table}\" As i " \
+            "WHERE " \
+                "%(attr_where_i)s " \
+            "ORDER BY i.\"{id_col}\" ASC;" % replacements
+
+    return query.format(**params)
+
+## to add more weight methods open a ticket or pull request
+
+def get_query(w_type, query_vals):
+    """Return requested query.
+        :param w_type: type of neighbors to calculate (knn or queen)
+        :param query_vals: values used to construct the query
+    """
+
+    if w_type == 'knn':
+        return knn(query_vals)
+    else:
+        return queen(query_vals)
+
+def get_attributes(query_res, attr_num):
+    """
+        :param query_res: query results with attributes and neighbors
+        :param attr_num: attribute number (1, 2, ...)
+    """
+    return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
+
+## Build weight object
+def get_weight(query_res, w_type='queen', num_ngbrs=5):
+    """
+        Construct PySAL weight from return value of query
+        :param query_res: query results with attributes and neighbors
+    """
+    if w_type == 'knn':
+        row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
+        weights = {x['id']: row_normed_weights for x in query_res}
+    elif w_type == 'queen':
+        weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
+                            if len(x['neighbors']) > 0
+                            else [] for x in query_res}
+
+    neighbors = {x['id']: x['neighbors'] for x in query_res}
+
+    return ps.W(neighbors, weights)
+
+def quad_position(quads):
+    """
+        Produce Moran's I classification based of n
+    """
+
+    lisa_sig = np.array([map_quads(q) for q in quads])
+
+    return lisa_sig
+
+def lisa_sig_vals(pvals, quads, threshold):
+    """
+        Produce Moran's I classification based of n
+    """
+
+    sig = (pvals <= threshold)
+
+    lisa_sig = np.empty(len(sig), np.chararray)
+
+    for idx, val in enumerate(sig):
+        if val:
+            lisa_sig[idx] = map_quads(quads[idx])
+        else:
+            lisa_sig[idx] = 'Not significant'
+
+    return lisa_sig
--- a/release/python/0.0.2/crankshaft/crankshaft/random_seeds.py
+++ b/release/python/0.0.2/crankshaft/crankshaft/random_seeds.py
@@ -0,0 +1,10 @@
+import random
+import numpy
+
+def set_random_seeds(value):
+    """
+    Set the seeds of the RNGs (Random Number Generators)
+    used internally.
+    """
+    random.seed(value)
+    numpy.random.seed(value)
--- a/release/python/0.0.2/crankshaft/setup.py
+++ b/release/python/0.0.2/crankshaft/setup.py
@@ -0,0 +1,48 @@
+
+"""
+CartoDB Spatial Analysis Python Library
+See:
+https://github.com/CartoDB/crankshaft
+"""
+
+from setuptools import setup, find_packages
+
+setup(
+    name='crankshaft',
+
+    version='0.0.2',
+
+    description='CartoDB Spatial Analysis Python Library',
+
+    url='https://github.com/CartoDB/crankshaft',
+
+    author='Data Services Team - CartoDB',
+    author_email='dataservices@cartodb.com',
+
+    license='MIT',
+
+    classifiers=[
+        'Development Status :: 3 - Alpha',
+        'Intended Audience :: Mapping comunity',
+        'Topic :: Maps :: Mapping Tools',
+        'License :: OSI Approved :: MIT License',
+        'Programming Language :: Python :: 2.7',
+    ],
+
+    keywords='maps mapping tools spatial analysis geostatistics',
+
+    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
+
+    extras_require={
+        'dev': ['unittest'],
+        'test': ['unittest', 'nose', 'mock'],
+    },
+
+    # The choice of component versions is dictated by what's
+    # provisioned in the production servers.
+    install_requires=['pysal==1.9.1'],
+
+    requires=['pysal', 'numpy' ],
+
+    test_suite='test'
+)
--- a/release/python/0.0.2/crankshaft/test/fixtures/moran.json
+++ b/release/python/0.0.2/crankshaft/test/fixtures/moran.json
@@ -0,0 +1,52 @@
+[[0.9319096128346788, "HH"],
+[-1.135787401862846, "HL"],
+[0.11732030672508517, "Not significant"],
+[0.6152779669180425, "Not significant"],
+[-0.14657336660125297, "Not significant"],
+[0.6967858120189607, "Not significant"],
+[0.07949310115714454, "Not significant"],
+[0.4703198759258987, "Not significant"],
+[0.4421125200498064, "Not significant"],
+[0.5724288737143592, "Not significant"],
+[0.8970743435692062, "LL"],
+[0.18327334401918674, "Not significant"],
+[-0.01466729201304962, "Not significant"],
+[0.3481559372544409, "Not significant"],
+[0.06547094736902978, "Not significant"],
+[0.15482141569329988, "HH"],
+[0.4373841193538136, "Not significant"],
+[0.15971286468915544, "Not significant"],
+[1.0543588860308968, "Not significant"],
+[1.7372866900020818, "HH"],
+[1.091998586053999, "LL"],
+[0.1171572584252222, "Not significant"],
+[0.08438455015300014, "Not significant"],
+[0.06547094736902978, "Not significant"],
+[0.15482141569329985, "HH"],
+[1.1627044812890683, "HH"],
+[0.06547094736902978, "Not significant"],
+[0.795275137550483, "Not significant"],
+[0.18562939195219, "LL"],
+[0.3010757406693439, "Not significant"],
+[2.8205795942839376, "HH"],
+[0.11259190602909264, "Not significant"],
+[-0.07116352791516614, "Not significant"],
+[-0.09945240794119009, "Not significant"],
+[0.18562939195219, "LL"],
+[0.1832733440191868, "Not significant"],
+[-0.39054253768447705, "Not significant"],
+[-0.1672071289487642, "HL"],
+[0.3337669247916343, "Not significant"],
+[0.2584386102554792, "Not significant"],
+[-0.19733845476322634, "HL"],
+[-0.9379282899805409, "LH"],
+[-0.028770969951095866, "Not significant"],
+[0.051367269430983485, "Not significant"],
+[-0.2172548045913472, "LH"],
+[0.05136726943098351, "Not significant"],
+[0.04191046803899837, "Not significant"],
+[0.7482357030403517, "HH"],
+[-0.014585767863118111, "Not significant"],
+[0.5410013139159929, "Not significant"],
+[1.0223932668429925, "LL"],
+[1.4179402898927476, "LL"]]
--- a/release/python/0.0.2/crankshaft/test/fixtures/neighbors.json
+++ b/release/python/0.0.2/crankshaft/test/fixtures/neighbors.json
@@ -0,0 +1,54 @@
+[
+    {"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
+    {"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
+    {"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
+    {"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
+    {"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
+    {"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
+    {"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
+    {"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
+    {"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
+    {"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
+    {"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
+    {"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
+    {"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
+    {"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
+    {"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
+    {"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
+    {"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
+    {"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
+    {"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
+    {"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
+    {"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
+    {"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
+    {"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
+    {"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
+    {"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
+    {"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
+    {"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
+    {"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
+    {"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
+    {"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
+    {"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
+    {"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
+    {"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
+    {"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
+    {"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
+    {"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
+    {"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
+    {"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
+    {"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
+    {"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
+    {"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
+    {"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
+    {"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
+    {"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
+    {"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
+    {"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
+    {"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
+    {"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
+    {"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
+    {"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
+    {"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
+    {"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
+  ]
--- a/release/python/0.0.2/crankshaft/test/helper.py
+++ b/release/python/0.0.2/crankshaft/test/helper.py
@@ -0,0 +1,13 @@
+import unittest
+
+from mock_plpy import MockPlPy
+plpy = MockPlPy()
+
+import sys
+sys.modules['plpy'] = plpy
+
+import os
+
+def fixture_file(name):
+    dir = os.path.dirname(os.path.realpath(__file__))
+    return os.path.join(dir, 'fixtures', name)
--- a/release/python/0.0.2/crankshaft/test/mock_plpy.py
+++ b/release/python/0.0.2/crankshaft/test/mock_plpy.py
@@ -0,0 +1,34 @@
+import re
+
+class MockPlPy:
+    def __init__(self):
+        self._reset()
+
+    def _reset(self):
+        self.infos = []
+        self.notices = []
+        self.debugs = []
+        self.logs = []
+        self.warnings = []
+        self.errors = []
+        self.fatals = []
+        self.executes = []
+        self.results = []
+        self.prepares = []
+        self.results = []
+
+    def _define_result(self, query, result):
+        pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
+        self.results.append([pattern, result])
+
+    def notice(self, msg):
+        self.notices.append(msg)
+
+    def info(self, msg):
+        self.infos.append(msg)
+
+    def execute(self, query): # TODO: additional arguments
+       for result in self.results:
+          if result[0].match(query):
+            return result[1]
+       return []
--- a/release/python/0.0.2/crankshaft/test/test_clustering_moran.py
+++ b/release/python/0.0.2/crankshaft/test/test_clustering_moran.py
@@ -0,0 +1,144 @@
+import unittest
+import numpy as np
+
+import unittest
+
+
+# from mock_plpy import MockPlPy
+# plpy = MockPlPy()
+#
+# import sys
+# sys.modules['plpy'] = plpy
+from helper import plpy, fixture_file
+
+import crankshaft.clustering as cc
+from crankshaft import random_seeds
+import json
+
+class MoranTest(unittest.TestCase):
+    """Testing class for Moran's I functions."""
+
+    def setUp(self):
+        plpy._reset()
+        self.params = {"id_col": "cartodb_id",
+                       "attr1": "andy",
+                       "attr2": "jay_z",
+                       "table": "a_list",
+                       "geom_col": "the_geom",
+                       "num_ngbrs": 321}
+        self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
+        self.moran_data = json.loads(open(fixture_file('moran.json')).read())
+
+    def test_map_quads(self):
+        """Test map_quads."""
+        self.assertEqual(cc.map_quads(1), 'HH')
+        self.assertEqual(cc.map_quads(2), 'LH')
+        self.assertEqual(cc.map_quads(3), 'LL')
+        self.assertEqual(cc.map_quads(4), 'HL')
+        self.assertEqual(cc.map_quads(33), None)
+        self.assertEqual(cc.map_quads('andy'), None)
+
+    def test_query_attr_select(self):
+        """Test query_attr_select."""
+
+        ans = "i.\"{attr1}\"::numeric As attr1, " \
+              "i.\"{attr2}\"::numeric As attr2, "
+
+        self.assertEqual(cc.query_attr_select(self.params), ans)
+
+    def test_query_attr_where(self):
+        """Test query_attr_where."""
+
+        ans = "idx_replace.\"{attr1}\" IS NOT NULL AND "\
+              "idx_replace.\"{attr2}\" IS NOT NULL AND "\
+              "idx_replace.\"{attr2}\" <> 0"
+
+        self.assertEqual(cc.query_attr_where(self.params), ans)
+
+    def test_knn(self):
+        """Test knn function."""
+
+        ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
+              "i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT j.\"cartodb_id\" " \
+              "FROM \"a_list\" As j WHERE j.\"andy\" IS NOT NULL AND " \
+              "j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 ORDER BY " \
+              "j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 OFFSET 1 ) ) " \
+              "As neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT " \
+              "NULL AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER " \
+              "BY i.\"cartodb_id\" ASC;"
+
+        self.assertEqual(cc.knn(self.params), ans)
+
+    def test_queen(self):
+        """Test queen neighbors function."""
+
+        ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
+              "i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
+              "j.\"cartodb_id\" FROM \"a_list\" As j WHERE ST_Touches(" \
+              "i.\"the_geom\", j.\"the_geom\") AND j.\"andy\" IS NOT NULL " \
+              "AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0)) As " \
+              "neighbors FROM \"a_list\" As i WHERE i.\"andy\" IS NOT NULL " \
+              "AND i.\"jay_z\" IS NOT NULL AND i.\"jay_z\" <> 0 ORDER BY " \
+              "i.\"cartodb_id\" ASC;"
+
+        self.assertEqual(cc.queen(self.params), ans)
+
+    def test_get_query(self):
+        """Test get_query."""
+
+        ans = "SELECT i.\"cartodb_id\" As id, i.\"andy\"::numeric As attr1, " \
+              "i.\"jay_z\"::numeric As attr2, (SELECT ARRAY(SELECT " \
+              "j.\"cartodb_id\" FROM \"a_list\" As j WHERE j.\"andy\" IS " \
+              "NOT NULL AND j.\"jay_z\" IS NOT NULL AND j.\"jay_z\" <> 0 " \
+              "ORDER BY j.\"the_geom\" <-> i.\"the_geom\" ASC LIMIT 321 " \
+              "OFFSET 1 ) ) As neighbors FROM \"a_list\" As i WHERE " \
+              "i.\"andy\" IS NOT NULL AND i.\"jay_z\" IS NOT NULL AND " \
+              "i.\"jay_z\" <> 0 ORDER BY i.\"cartodb_id\" ASC;"
+
+        self.assertEqual(cc.get_query('knn', self.params), ans)
+
+    def test_get_attributes(self):
+        """Test get_attributes."""
+
+        ## need to add tests
+
+        self.assertEqual(True, True)
+
+    def test_get_weight(self):
+        """Test get_weight."""
+
+        self.assertEqual(True, True)
+
+
+    def test_quad_position(self):
+        """Test lisa_sig_vals."""
+
+        quads = np.array([1, 2, 3, 4], np.int)
+
+        ans = np.array(['HH', 'LH', 'LL', 'HL'])
+        test_ans = cc.quad_position(quads)
+
+        self.assertTrue((test_ans == ans).all())
+
+    def test_moran_local(self):
+        """Test Moran's I local"""
+        data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
+        plpy._define_result('select', data)
+        random_seeds.set_random_seeds(1234)
+        result = cc.moran_local('table', 'value', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
+        result = [(row[0], row[1]) for row in result]
+        expected = self.moran_data
+        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
+            self.assertAlmostEqual(res_val, exp_val)
+            self.assertEqual(res_quad, exp_quad)
+
+    def test_moran_local_rate(self):
+        """Test Moran's I rate"""
+        data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
+        plpy._define_result('select', data)
+        random_seeds.set_random_seeds(1234)
+        result = cc.moran_local_rate('table', 'numerator', 'denominator', 0.05, 5, 99, 'the_geom', 'cartodb_id', 'knn')
+        result = [(row[0], row[1]) for row in result]
+        expected = self.moran_data
+        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
+            self.assertAlmostEqual(res_val, exp_val)
--- a/src/pg/.gitignore
+++ b/src/pg/.gitignore
@@ -0,0 +1,6 @@
+regression.diffs
+regression.out
+results/
+crankshaft--dev.sql
+crankshaft--dev--current.sql
+crankshaft--current--dev.sql
--- a/src/pg/Makefile
+++ b/src/pg/Makefile
@@ -0,0 +1,60 @@
+include ../../Makefile.global
+
+# Development tasks:
+#
+# * install generates the control & script files into src/pg/
+#   and installs then into the PostgreSQL extensions directory;
+#   requires sudo. In additionof the current development version
+#   named 'dev', an alias 'current' is generating for ease of
+#   update (upgrade to 'current', then to 'dev').
+#   the python module is installed in a virtualenv in envs/dev/
+# * test runs the tests for the currently generated Development
+#   extension.
+
+DATA         = $(EXTENSION)--dev.sql \
+	             $(EXTENSION)--current--dev.sql \
+	             $(EXTENSION)--dev--current.sql
+
+SOURCES_DATA_DIR = sql
+SOURCES_DATA = $(wildcard $(SOURCES_DATA_DIR)/*.sql)
+
+VIRTUALENV_PATH = $(realpath ../../envs)
+ESC_VIRVIRTUALENV_PATH = $(subst /,\/,$(VIRTUALENV_PATH))
+
+REPLACEMENTS = -e 's/@@VERSION@@/$(EXTVERSION)/g' \
+               -e 's/@@VIRTUALENV_PATH@@/$(ESC_VIRVIRTUALENV_PATH)/g'
+
+$(DATA): $(SOURCES_DATA)
+	$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > $@
+
+TEST_DIR = test
+REGRESS = $(notdir $(basename $(wildcard $(TEST_DIR)/sql/*test.sql)))
+REGRESS_OPTS = --inputdir='$(TEST_DIR)' --outputdir='$(TEST_DIR)'
+
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+
+# This seems to be needed at least for PG 9.3.11
+all: $(DATA)
+
+test: export PGUSER=postgres
+test: installcheck
+
+# Release tasks
+
+../../release/$(EXTENSION).control: $(EXTENSION).control
+	cp $< $@
+
+# Prepare new release from the currently installed development version,
+# for the current version X.Y.Z (defined in the control file)
+# producing the extension script and control files in releases/
+# and the python package in releases/python/X.Y.Z/crankshaft/
+release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
+	$(SED) $(REPLACEMENTS) $(SOURCES_DATA_DIR)/*.sql > ../../release/$(EXTENSION)--$(EXTVERSION).sql
+
+# Install the current relese into the PostgreSQL extensions directory
+# and the Python package in a virtual environment envs/X.Y.Z
+deploy:
+	$(INSTALL_DATA) ../../release/$(EXTENSION).control '$(DESTDIR)$(datadir)/extension/'
+	$(INSTALL_DATA) ../../release/*.sql '$(DESTDIR)$(datadir)/extension/'
--- a/src/pg/crankshaft.control
+++ b/src/pg/crankshaft.control
@@ -0,0 +1,5 @@
+comment = 'CartoDB Spatial Analysis extension'
+default_version = '0.0.2'
+requires = 'plpythonu, postgis, cartodb'
+superuser = true
+schema = cdb_crankshaft
--- a/pg/sql/0.0.1/00_header.sql
+++ b/pg/sql/0.0.1/00_header.sql
--- a/src/pg/sql/01_version.sql
+++ b/src/pg/sql/01_version.sql
@@ -0,0 +1,12 @@
+-- Version number of the extension release
+CREATE OR REPLACE FUNCTION cdb_crankshaft_version()
+RETURNS text AS $$
+  SELECT '@@VERSION@@'::text;
+$$ language 'sql' STABLE STRICT;
+
+-- Internal identifier of the installed extension instence
+-- e.g. 'dev' for current development version
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_internal_version()
+RETURNS text AS $$
+  SELECT installed_version FROM pg_available_extensions where name='crankshaft' and pg_available_extensions IS NOT NULL;
+$$ language 'sql' STABLE STRICT;
--- a/src/pg/sql/02_py.sql
+++ b/src/pg/sql/02_py.sql
@@ -0,0 +1,23 @@
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_virtualenvs_path()
+RETURNS text
+AS $$
+  BEGIN
+    -- RETURN '/opt/virtualenvs/crankshaft';
+    RETURN '@@VIRTUALENV_PATH@@';
+  END;
+$$ language plpgsql IMMUTABLE STRICT;
+
+-- Use the crankshaft python module
+CREATE OR REPLACE FUNCTION _cdb_crankshaft_activate_py()
+RETURNS VOID
+AS $$
+    import os
+    # plpy.notice('%',str(os.environ))
+    # activate virtualenv
+    crankshaft_version = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_internal_version()')[0]['_cdb_crankshaft_internal_version']
+    base_path = plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_virtualenvs_path()')[0]['_cdb_crankshaft_virtualenvs_path']
+    default_venv_path = os.path.join(base_path, crankshaft_version)
+    venv_path =  os.environ.get('CRANKSHAFT_VENV', default_venv_path)
+    activate_path = venv_path + '/bin/activate_this.py'
+    exec(open(activate_path).read(), dict(__file__=activate_path))
+$$ LANGUAGE plpythonu;
--- a/pg/sql/0.0.1/01_random_seeds.sql
+++ b/pg/sql/0.0.1/01_random_seeds.sql
@@ -4,6 +4,7 @@
 CREATE OR REPLACE FUNCTION
 _cdb_random_seeds (seed_value INTEGER) RETURNS VOID
 AS $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
  from crankshaft import random_seeds
  random_seeds.set_random_seeds(seed_value)
 $$ LANGUAGE plpythonu;
--- a/src/pg/sql/10_moran.sql
+++ b/src/pg/sql/10_moran.sql
@@ -0,0 +1,89 @@
+-- Moran's I (global)
+CREATE OR REPLACE FUNCTION
+  CDB_AreasOfInterest_Global (
+      subquery TEXT,
+      attr_name TEXT,
+      permutations INT DEFAULT 99,
+      geom_col TEXT DEFAULT 'the_geom',
+      id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn',
+      num_ngbrs INT DEFAULT 5)
+RETURNS TABLE (moran NUMERIC, significance NUMERIC)
+AS $$
+  from crankshaft.clustering import moran_local
+  # TODO: use named parameters or a dictionary
+  return moran(subquery, attr, num_ngbrs, permutations, geom_col, id_col, w_type)
+$$ LANGUAGE plpythonu;
+
+-- Moran's I Local
+CREATE OR REPLACE FUNCTION
+  CDB_AreasOfInterest_Local(
+      subquery TEXT,
+      attr TEXT,
+      permutations INT DEFAULT 99,
+      geom_col TEXT DEFAULT 'the_geom',
+      id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn',
+      num_ngbrs INT DEFAULT 5)
+RETURNS TABLE (moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
+AS $$
+  from crankshaft.clustering import moran_local
+  # TODO: use named parameters or a dictionary
+  return moran_local(subquery, attr, permutations, geom_col, id_col, w_type, num_ngbrs)
+$$ LANGUAGE plpythonu;
+
+-- Moran's I Rate (global)
+CREATE OR REPLACE FUNCTION
+  CDB_AreasOfInterest_Global_Rate(
+      subquery TEXT,
+      numerator TEXT,
+      denominator TEXT,
+      permutations INT DEFAULT 99,
+      geom_col TEXT DEFAULT 'the_geom',
+      id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn',
+      num_ngbrs INT DEFAULT 5)
+RETURNS TABLE (moran FLOAT, significance FLOAT)
+AS $$
+  from crankshaft.clustering import moran_local
+  # TODO: use named parameters or a dictionary
+  return moran_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
+$$ LANGUAGE plpythonu;
+
+
+-- Moran's I Local Rate
+CREATE OR REPLACE FUNCTION
+  CDB_AreasOfInterest_Local_Rate(
+      subquery TEXT,
+      numerator TEXT,
+      denominator TEXT,
+      permutations INT DEFAULT 99,
+      geom_col TEXT DEFAULT 'the_geom',
+      id_col TEXT DEFAULT 'cartodb_id',
+      w_type TEXT DEFAULT 'knn',
+      num_ngbrs INT DEFAULT 5)
+RETURNS
+TABLE(moran NUMERIC, quads TEXT, significance NUMERIC, ids INT, y NUMERIC)
+AS $$
+  from crankshaft.clustering import moran_local_rate
+  # TODO: use named parameters or a dictionary
+  return moran_local_rate(subquery, numerator, denominator, permutations, geom_col, id_col, w_type, num_ngbrs)
+$$ LANGUAGE plpythonu;
+
+-- -- Moran's I Local Bivariate
+-- CREATE OR REPLACE FUNCTION
+--   cdb_moran_local_bv(
+--       subquery TEXT,
+--       attr1 TEXT,
+--       attr2 TEXT,
+--       permutations INT DEFAULT 99,
+--       geom_col TEXT DEFAULT 'the_geom',
+--       id_col TEXT DEFAULT 'cartodb_id',
+--       w_type TEXT DEFAULT 'knn',
+--       num_ngbrs INT DEFAULT 5)
+-- RETURNS TABLE(moran FLOAT, quads TEXT, significance FLOAT, ids INT, y numeric)
+-- AS $$
+--   from crankshaft.clustering import moran_local_bv
+--   # TODO: use named parameters or a dictionary
+--   return moran_local_bv(t, attr1, attr2, permutations, geom_col, id_col, w_type, num_ngbrs)
+-- $$ LANGUAGE plpythonu;
--- a/pg/sql/0.0.1/03_overlap_sum.sql
+++ b/pg/sql/0.0.1/03_overlap_sum.sql
--- a/pg/sql/0.0.1/04_dot_density.sql
+++ b/pg/sql/0.0.1/04_dot_density.sql
--- a/src/pg/sql/80_similarity_rank.sql
+++ b/src/pg/sql/80_similarity_rank.sql
@@ -0,0 +1,15 @@
+CREATE OR REPLACE FUNCTION cdb_SimilarityRank(cartodb_id numeric, query text)
+returns TABLE (cartodb_id NUMERIC, similarity NUMERIC)
+as $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.similarity import similarity_rank
+  return similarity_rank(cartodb_id, query)
+$$ LANGUAGE plpythonu;
+
+CREATE OR REPLACE FUNCTION cdb_MostSimilar(cartodb_id numeric, query text ,matches numeric)
+returns TABLE (cartodb_id NUMERIC, similarity NUMERIC)
+as $$
+  plpy.execute('SELECT cdb_crankshaft._cdb_crankshaft_activate_py()')
+  from crankshaft.similarity import most_similar
+  return most_similar(matches, query)
+$$ LANGUAGE plpythonu;
--- a/pg/sql/0.0.1/90_permissions.sql
+++ b/pg/sql/0.0.1/90_permissions.sql
--- a/pg/test/0.0.1/expected/01_install_test.out
+++ b/pg/test/0.0.1/expected/01_install_test.out
@@ -3,4 +3,4 @@ CREATE EXTENSION plpythonu;
 CREATE EXTENSION postgis;
 CREATE EXTENSION cartodb;
 -- Install the extension
-CREATE EXTENSION crankshaft;
+CREATE EXTENSION crankshaft VERSION 'dev';
--- a/pg/test/0.0.1/expected/02_moran_test.out
+++ b/pg/test/0.0.1/expected/02_moran_test.out
@@ -110,7 +110,7 @@ INSERT INTO ppoints2 VALUES
 (24,'0101000020E61000009C5F91C5095C17C0C78784B15A4F4540'::geometry,'24','07',0.3, 1.0),
 (29,'0101000020E6100000C34D4A5B48E712C092E680892C684240'::geometry,'29','01',0.3, 1.0),
 (52,'0101000020E6100000406A545EB29A07C04E5F0BDA39A54140'::geometry,'52','19',0.0, 1.01)
-- Moral functions perform some nondeterministic computations
+-- Areas of Interest functions perform some nondeterministic computations
 -- (to estimate the significance); we will set the seeds for the RNGs
 -- that affect those results to have repeateble results
 SELECT cdb_crankshaft._cdb_random_seeds(1234);
@@ -121,67 +121,61 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);

 SELECT ppoints.code, m.quads
  FROM ppoints
-  JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
+  JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
    ON ppoints.cartodb_id = m.ids
  ORDER BY ppoints.code;
-NOTICE:  ** Constructing query
-CONTEXT:  PL/Python function "cdb_moran_local"
-NOTICE:  ** Query returned with 52 rows
-CONTEXT:  PL/Python function "cdb_moran_local"
-NOTICE:  ** Finished calculations
-CONTEXT:  PL/Python function "cdb_moran_local"
- code |      quads      
------+-----------------
+ code | quads 
+------+-------
 01   | HH
 02   | HL
- 03   | Not significant
- 04   | Not significant
- 05   | Not significant
- 06   | Not significant
- 07   | Not significant
- 08   | Not significant
- 09   | Not significant
- 10   | Not significant
+ 03   | LL
+ 04   | LL
+ 05   | LH
+ 06   | LL
+ 07   | HH
+ 08   | HH
+ 09   | HH
+ 10   | LL
 11   | LL
- 12   | Not significant
- 13   | Not significant
- 14   | Not significant
- 15   | Not significant
+ 12   | LL
+ 13   | HL
+ 14   | LL
+ 15   | LL
 16   | HH
- 17   | Not significant
- 18   | Not significant
- 19   | Not significant
+ 17   | HH
+ 18   | LL
+ 19   | HH
 20   | HH
 21   | LL
- 22   | Not significant
- 23   | Not significant
- 24   | Not significant
+ 22   | HH
+ 23   | LL
+ 24   | LL
 25   | HH
 26   | HH
- 27   | Not significant
- 28   | Not significant
+ 27   | LL
+ 28   | HH
 29   | LL
- 30   | Not significant
+ 30   | LL
 31   | HH
- 32   | Not significant
- 33   | Not significant
- 34   | Not significant
+ 32   | LL
+ 33   | HL
+ 34   | LH
 35   | LL
- 36   | Not significant
- 37   | Not significant
+ 36   | LL
+ 37   | HL
 38   | HL
- 39   | Not significant
- 40   | Not significant
+ 39   | HH
+ 40   | HH
 41   | HL
 42   | LH
- 43   | Not significant
- 44   | Not significant
+ 43   | LH
+ 44   | LL
 45   | LH
- 46   | Not significant
- 47   | Not significant
+ 46   | LL
+ 47   | LL
 48   | HH
- 49   | Not significant
- 50   | Not significant
+ 49   | LH
+ 50   | HH
 51   | LL
 52   | LL
 (52 rows)
@@ -194,67 +188,61 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);

 SELECT ppoints2.code, m.quads
  FROM ppoints2
-  JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
+  JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
    ON ppoints2.cartodb_id = m.ids
  ORDER BY ppoints2.code;
-NOTICE:  ** Constructing query
-CONTEXT:  PL/Python function "cdb_moran_local_rate"
-NOTICE:  ** Query returned with 51 rows
-CONTEXT:  PL/Python function "cdb_moran_local_rate"
-NOTICE:  ** Finished calculations
-CONTEXT:  PL/Python function "cdb_moran_local_rate"
- code |      quads      
------+-----------------
+ code | quads 
+------+-------
 01   | LL
- 02   | Not significant
- 03   | Not significant
- 04   | Not significant
- 05   | Not significant
- 06   | Not significant
- 07   | Not significant
- 08   | Not significant
+ 02   | LH
+ 03   | HH
+ 04   | HH
+ 05   | LL
+ 06   | HH
+ 07   | LL
+ 08   | LL
 09   | LL
- 10   | Not significant
+ 10   | HH
 11   | HH
- 12   | Not significant
- 13   | Not significant
- 14   | Not significant
- 15   | Not significant
- 16   | Not significant
+ 12   | HL
+ 13   | LL
+ 14   | HH
+ 15   | LL
+ 16   | LL
 17   | LL
- 18   | Not significant
- 19   | Not significant
+ 18   | LH
+ 19   | LL
 20   | LL
- 21   | Not significant
- 22   | Not significant
- 23   | Not significant
- 24   | Not significant
+ 21   | HH
+ 22   | LL
+ 23   | HL
+ 24   | LL
 25   | LL
 26   | LL
- 27   | Not significant
- 28   | Not significant
+ 27   | LL
+ 28   | LL
 29   | LH
- 30   | Not significant
+ 30   | HH
 31   | LL
- 32   | Not significant
- 33   | Not significant
- 34   | Not significant
+ 32   | LL
+ 33   | LL
+ 34   | LL
 35   | LH
- 36   | Not significant
- 37   | Not significant
+ 36   | HL
+ 37   | LH
 38   | LH
- 39   | Not significant
- 40   | Not significant
+ 39   | LL
+ 40   | LL
 41   | LH
 42   | HL
- 43   | Not significant
- 44   | Not significant
+ 43   | LL
+ 44   | HL
 45   | LL
- 46   | Not significant
- 47   | Not significant
+ 46   | HL
+ 47   | LL
 48   | LL
- 49   | Not significant
- 50   | Not significant
- 51   | Not significant
+ 49   | HL
+ 50   | LL
+ 51   | HH
 (51 rows)

--- a/pg/test/0.0.1/expected/03_overlap_sum_test.out
+++ b/pg/test/0.0.1/expected/03_overlap_sum_test.out
--- a/pg/test/0.0.1/expected/04_dot_density_test.out
+++ b/pg/test/0.0.1/expected/04_dot_density_test.out
--- a/src/pg/test/fixtures/polyg_values.sql
+++ b/src/pg/test/fixtures/polyg_values.sql
--- a/src/pg/test/fixtures/ppoints.sql
+++ b/src/pg/test/fixtures/ppoints.sql
--- a/src/pg/test/fixtures/ppoints2.sql
+++ b/src/pg/test/fixtures/ppoints2.sql
--- a/pg/test/0.0.1/sql/01_install_test.sql
+++ b/pg/test/0.0.1/sql/01_install_test.sql
@@ -4,4 +4,4 @@ CREATE EXTENSION postgis;
 CREATE EXTENSION cartodb;

 -- Install the extension
-CREATE EXTENSION crankshaft;
+CREATE EXTENSION crankshaft VERSION 'dev';
--- a/pg/test/0.0.1/sql/02_moran_test.sql
+++ b/pg/test/0.0.1/sql/02_moran_test.sql
@@ -1,14 +1,14 @@
 \i test/fixtures/ppoints.sql
 \i test/fixtures/ppoints2.sql

-- Moral functions perform some nondeterministic computations
+-- Areas of Interest functions perform some nondeterministic computations
 -- (to estimate the significance); we will set the seeds for the RNGs
 -- that affect those results to have repeateble results
 SELECT cdb_crankshaft._cdb_random_seeds(1234);

 SELECT ppoints.code, m.quads
  FROM ppoints
-  JOIN cdb_crankshaft.cdb_moran_local('ppoints', 'value') m
+  JOIN cdb_crankshaft.CDB_AreasOfInterest_Local('SELECT * FROM ppoints', 'value') m
    ON ppoints.cartodb_id = m.ids
  ORDER BY ppoints.code;

@@ -16,6 +16,6 @@ SELECT cdb_crankshaft._cdb_random_seeds(1234);

 SELECT ppoints2.code, m.quads
  FROM ppoints2
-  JOIN cdb_crankshaft.cdb_moran_local_rate('ppoints2', 'numerator', 'denominator') m
+  JOIN cdb_crankshaft.CDB_AreasOfInterest_Local_Rate('SELECT * FROM ppoints2', 'numerator', 'denominator') m
    ON ppoints2.cartodb_id = m.ids
  ORDER BY ppoints2.code;
--- a/pg/test/0.0.1/sql/03_overlap_sum_test.sql
+++ b/pg/test/0.0.1/sql/03_overlap_sum_test.sql
--- a/pg/test/0.0.1/sql/04_dot_density_test.sql
+++ b/pg/test/0.0.1/sql/04_dot_density_test.sql
--- a/pg/test/0.0.1/sql/90_permissions.sql
+++ b/pg/test/0.0.1/sql/90_permissions.sql
@@ -9,7 +9,7 @@ SET search_path TO public,cartodb,cdb_crankshaft;
 -- Exercise public functions
 SELECT ppoints.code, m.quads
  FROM ppoints
-  JOIN cdb_moran_local('ppoints', 'value') m
+  JOIN CDB_AreasOfInterest_Local('ppoints', 'value') m
    ON ppoints.cartodb_id = m.ids
  ORDER BY ppoints.code;
 SELECT round(cdb_overlap_sum(
--- a/src/py/Makefile
+++ b/src/py/Makefile
@@ -0,0 +1,22 @@
+include ../../Makefile.global
+
+# Install the package locally for development
+install:
+	virtualenv --system-site-packages ../../envs/dev
+	# source ../../envs/dev/bin/activate
+	../../envs/dev/bin/pip install -I ./crankshaft
+	../../envs/dev/bin/pip install -I nose
+
+# Test develpment install
+test:
+	../../envs/dev/bin/nosetests crankshaft/test/
+
+release: ../../release/$(EXTENSION).control $(SOURCES_DATA)
+	mkdir -p ../../release/python/$(EXTVERSION)
+	cp -r ./$(PACKAGE) ../../release/python/$(EXTVERSION)/
+	$(SED) -i -r 's/version='"'"'[0-9]+\.[0-9]+\.[0-9]+'"'"'/version='"'"'$(EXTVERSION)'"'"'/g'  ../../release/python/$(EXTVERSION)/$(PACKAGE)/setup.py
+
+deploy:
+	virtualenv --system-site-packages $(VIRTUALENV_PATH)/$(RELEASE_VERSION)
+	$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I -U ../../release/python/$(RELEASE_VERSION)/$(PACKAGE)
+	$(VIRTUALENV_PATH)/$(RELEASE_VERSION)/bin/pip install -I nose
--- a/src/py/README.md
+++ b/src/py/README.md
@@ -0,0 +1,88 @@
+# Crankshaft Python Package
+
+...
+### Run the tests
+
+```bash
+cd crankshaft
+nosetests test/
+```
+
+## Notes about Python dependencies
+* This extension is targeted at production databases. Therefore certain restrictions must be assumed about the production environment vs other experimental environments.
+* We're using `pip` and `virtualenv` to generate a suitable isolated environment for python code that has  all the dependencies
+* Every dependency should be:
+  - Added to the `setup.py` file
+  - Installed through it
+  - Tested, when they have a test suite.
+  - Fixed in the `requirements.txt`
+* At present we use Python version 2.7.3
+
+---
+
+To avoid troublesome compilations/linkings we will use
+the available system package `python-scipy`.
+This package and its dependencies provide numpy 1.6.1
+and scipy 0.9.0. To be able to use these versions we cannot
+PySAL 1.10 or later, so we'll stick to 1.9.1.
+
+```
+apt-get install -y python-scipy
+```
+
+We'll use virtual environments to install our packages,
+but configued to use also system modules so that the
+mentioned scipy and numpy are used.
+
+    # Create a virtual environment for python
+    $ virtualenv --system-site-packages dev
+
+    # Activate the virtualenv
+    $ source dev/bin/activate
+
+    # Install all the requirements
+    # expect this to take a while, as it will trigger a few compilations
+    (dev) $ pip install -I ./crankshaft
+
+#### Test the libraries with that virtual env
+
+##### Test numpy library dependency:
+
+    import numpy
+    numpy.test('full')
+
+##### Run scipy tests
+
+    import scipy
+    scipy.test('full')
+
+##### Testing pysal
+
+See [http://pysal.readthedocs.org/en/latest/developers/testing.html]
+
+This will require putting this into `dev/lib/python2.7/site-packages/setup.cfg`:
+
+```
+[nosetests]
+ignore-files=collection
+exclude-dir=pysal/contrib
+
+[wheel]
+universal=1
+```
+
+And copying some files before executing the tests:
+(we'll use a temporary directory from where the tests will be executed because
+some tests expect some files in the current directory). Next must be executed
+from
+
+```
+cp dev/lib/python2.7/site-packages/pysal/examples/geodanet/* dev/local/lib/python2.7/site-packages/pysal/examples
+mkdir -p test_tmp && cd test_tmp && cp ../dev/lib/python2.7/site-packages/pysal/examples/geodanet/* ./
+```
+
+Then, execute the tests with:
+
+    import pysal
+    import nose
+    nose.runmodule('pysal')
--- a/src/py/crankshaft/crankshaft/init.py
+++ b/src/py/crankshaft/crankshaft/init.py
@@ -0,0 +1,3 @@
+import random_seeds
+import clustering
+import similarity
--- a/src/py/crankshaft/crankshaft/clustering/init.py
+++ b/src/py/crankshaft/crankshaft/clustering/init.py
@@ -0,0 +1 @@
+from moran import *
--- a/src/py/crankshaft/crankshaft/clustering/moran.py
+++ b/src/py/crankshaft/crankshaft/clustering/moran.py
@@ -0,0 +1,260 @@
+"""
+Moran's I geostatistics (global clustering & outliers presence)
+"""
+
+# TODO: Fill in local neighbors which have null/NoneType values with the
+#       average of the their neighborhood
+
+import pysal as ps
+import plpy
+
+# crankshaft module
+import crankshaft.pysal_utils as pu
+
+# High level interface ---------------------------------------
+
+def moran(subquery, attr_name,
+          permutations, geom_col, id_col, w_type, num_ngbrs):
+    """
+    Moran's I (global)
+    Implementation building neighbors with a PostGIS database and Moran's I
+     core clusters with PySAL.
+    Andy Eschbacher
+    """
+    qvals = {"id_col": id_col,
+             "attr1": attr_name,
+             "geom_col": geom_col,
+             "subquery": subquery,
+             "num_ngbrs": num_ngbrs}
+
+    query = pu.construct_neighbor_query(w_type, qvals)
+
+    plpy.notice('** Query: %s' % query)
+
+    try:
+        result = plpy.execute(query)
+        # if there are no neighbors, exit
+        if len(result) == 0:
+            return pu.empty_zipped_array(2)
+        plpy.notice('** Query returned with %d rows' % len(result))
+    except plpy.SPIError:
+        plpy.error('Error: areas of interest query failed, check input parameters')
+        plpy.notice('** Query failed: "%s"' % query)
+        plpy.notice('** Error: %s' % plpy.SPIError)
+        return pu.empty_zipped_array(2)
+
+    ## collect attributes
+    attr_vals = pu.get_attributes(result)
+
+    ## calculate weights
+    weight = pu.get_weight(result, w_type, num_ngbrs)
+
+    ## calculate moran global
+    moran_global = ps.esda.moran.Moran(attr_vals, weight,
+                                       permutations=permutations)
+
+    return zip([moran_global.I], [moran_global.EI])
+
+def moran_local(subquery, attr,
+                permutations, geom_col, id_col, w_type, num_ngbrs):
+    """
+    Moran's I implementation for PL/Python
+    Andy Eschbacher
+    """
+
+    # geometries with attributes that are null are ignored
+    # resulting in a collection of not as near neighbors
+
+    qvals = {"id_col": id_col,
+             "attr1": attr,
+             "geom_col": geom_col,
+             "subquery": subquery,
+             "num_ngbrs": num_ngbrs}
+
+    query = pu.construct_neighbor_query(w_type, qvals)
+
+    try:
+        result = plpy.execute(query)
+        # if there are no neighbors, exit
+        if len(result) == 0:
+            return pu.empty_zipped_array(5)
+    except plpy.SPIError:
+        plpy.error('Error: areas of interest query failed, check input parameters')
+        plpy.notice('** Query failed: "%s"' % query)
+        return pu.empty_zipped_array(5)
+
+    attr_vals = pu.get_attributes(result)
+    weight = pu.get_weight(result, w_type, num_ngbrs)
+
+    # calculate LISA values
+    lisa = ps.esda.moran.Moran_Local(attr_vals, weight,
+                                     permutations=permutations)
+
+    # find quadrants for each geometry
+    quads = quad_position(lisa.q)
+
+    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
+
+def moran_rate(subquery, numerator, denominator,
+               permutations, geom_col, id_col, w_type, num_ngbrs):
+    """
+    Moran's I Rate (global)
+    Andy Eschbacher
+    """
+    qvals = {"id_col": id_col,
+             "attr1": numerator,
+             "attr2": denominator,
+             "geom_col": geom_col,
+             "subquery": subquery,
+             "num_ngbrs": num_ngbrs}
+
+    query = pu.construct_neighbor_query(w_type, qvals)
+
+    plpy.notice('** Query: %s' % query)
+
+    try:
+        result = plpy.execute(query)
+        # if there are no neighbors, exit
+        if len(result) == 0:
+            return pu.empty_zipped_array(2)
+        plpy.notice('** Query returned with %d rows' % len(result))
+    except plpy.SPIError:
+        plpy.error('Error: areas of interest query failed, check input parameters')
+        plpy.notice('** Query failed: "%s"' % query)
+        plpy.notice('** Error: %s' % plpy.SPIError)
+        return pu.empty_zipped_array(2)
+
+    ## collect attributes
+    numer = pu.get_attributes(result, 1)
+    denom = pu.get_attributes(result, 2)
+
+    weight = pu.get_weight(result, w_type, num_ngbrs)
+
+    ## calculate moran global rate
+    lisa_rate = ps.esda.moran.Moran_Rate(numer, denom, weight,
+                                         permutations=permutations)
+
+    return zip([lisa_rate.I], [lisa_rate.EI])
+
+def moran_local_rate(subquery, numerator, denominator,
+                     permutations, geom_col, id_col, w_type, num_ngbrs):
+    """
+        Moran's I Local Rate
+        Andy Eschbacher
+    """
+    # geometries with values that are null are ignored
+    # resulting in a collection of not as near neighbors
+
+    query = pu.construct_neighbor_query(w_type,
+                                     {"id_col": id_col,
+                                      "numerator": numerator,
+                                      "denominator": denominator,
+                                      "geom_col": geom_col,
+                                      "subquery": subquery,
+                                      "num_ngbrs": num_ngbrs})
+
+    try:
+        result = plpy.execute(query)
+        # if there are no neighbors, exit
+        if len(result) == 0:
+            return pu.empty_zipped_array(5)
+    except plpy.SPIError:
+        plpy.error('Error: areas of interest query failed, check input parameters')
+        plpy.notice('** Query failed: "%s"' % query)
+        plpy.notice('** Error: %s' % plpy.SPIError)
+        return pu.empty_zipped_array(5)
+
+    ## collect attributes
+    numer = pu.get_attributes(result, 1)
+    denom = pu.get_attributes(result, 2)
+
+    weight = pu.get_weight(result, w_type, num_ngbrs)
+
+    # calculate LISA values
+    lisa = ps.esda.moran.Moran_Local_Rate(numer, denom, weight,
+                                          permutations=permutations)
+
+    # find units of significance
+    quads = quad_position(lisa.q)
+
+    return zip(lisa.Is, quads, lisa.p_sim, weight.id_order, lisa.y)
+
+def moran_local_bv(subquery, attr1, attr2,
+                   permutations, geom_col, id_col, w_type, num_ngbrs):
+    """
+        Moran's I (local) Bivariate (untested)
+    """
+    plpy.notice('** Constructing query')
+
+    qvals = {"num_ngbrs": num_ngbrs,
+             "attr1": attr1,
+             "attr2": attr2,
+             "subquery": subquery,
+             "geom_col": geom_col,
+             "id_col": id_col}
+
+    query = pu.construct_neighbor_query(w_type, qvals)
+
+    try:
+        result = plpy.execute(query)
+        # if there are no neighbors, exit
+        if len(result) == 0:
+            return pu.empty_zipped_array(4)
+    except plpy.SPIError:
+        plpy.error("Error: areas of interest query failed, " \
+                   "check input parameters")
+        plpy.notice('** Query failed: "%s"' % query)
+        return pu.empty_zipped_array(4)
+
+    ## collect attributes
+    attr1_vals = pu.get_attributes(result, 1)
+    attr2_vals = pu.get_attributes(result, 2)
+
+    # create weights
+    weight = pu.get_weight(result, w_type, num_ngbrs)
+
+    # calculate LISA values
+    lisa = ps.esda.moran.Moran_Local_BV(attr1_vals, attr2_vals, weight,
+                                        permutations=permutations)
+
+    plpy.notice("len of Is: %d" % len(lisa.Is))
+
+    # find clustering of significance
+    lisa_sig = quad_position(lisa.q)
+
+    plpy.notice('** Finished calculations')
+
+    return zip(lisa.Is, lisa_sig, lisa.p_sim, weight.id_order)
+
+# Low level functions ----------------------------------------
+
+def map_quads(coord):
+    """
+        Map a quadrant number to Moran's I designation
+        HH=1, LH=2, LL=3, HL=4
+        Input:
+        @param coord (int): quadrant of a specific measurement
+        Output:
+            classification (one of 'HH', 'LH', 'LL', or 'HL')
+    """
+    if coord == 1:
+        return 'HH'
+    elif coord == 2:
+        return 'LH'
+    elif coord == 3:
+        return 'LL'
+    elif coord == 4:
+        return 'HL'
+    else:
+        return None
+
+def quad_position(quads):
+    """
+        Produce Moran's I classification based of n
+        Input:
+        @param quads ndarray: an array of quads classified by
+          1-4 (PySAL default)
+        Output:
+        @param list: an array of quads classied by 'HH', 'LL', etc.
+    """
+    return [map_quads(q) for q in quads]
--- a/src/py/crankshaft/crankshaft/pysal_utils/init.py
+++ b/src/py/crankshaft/crankshaft/pysal_utils/init.py
@@ -0,0 +1 @@
+from pysal_utils import *
--- a/src/py/crankshaft/crankshaft/pysal_utils/pysal_utils.py
+++ b/src/py/crankshaft/crankshaft/pysal_utils/pysal_utils.py
@@ -0,0 +1,152 @@
+"""
+    Utilities module for generic PySAL functionality, mainly centered on translating queries into numpy arrays or PySAL weights objects
+"""
+
+import numpy as np
+import pysal as ps
+
+def construct_neighbor_query(w_type, query_vals):
+    """Return query (a string) used for finding neighbors
+        @param w_type text: type of neighbors to calculate ('knn' or 'queen')
+        @param query_vals dict: values used to construct the query
+    """
+
+    if w_type == 'knn':
+        return knn(query_vals)
+    else:
+        return queen(query_vals)
+
+## Build weight object
+def get_weight(query_res, w_type='knn', num_ngbrs=5):
+    """
+        Construct PySAL weight from return value of query
+        @param query_res: query results with attributes and neighbors
+    """
+    if w_type == 'knn':
+        row_normed_weights = [1.0 / float(num_ngbrs)] * num_ngbrs
+        weights = {x['id']: row_normed_weights for x in query_res}
+    else:
+        weights = {x['id']: [1.0 / len(x['neighbors'])] * len(x['neighbors'])
+                            if len(x['neighbors']) > 0
+                            else [] for x in query_res}
+
+    neighbors = {x['id']: x['neighbors'] for x in query_res}
+
+    return ps.W(neighbors, weights)
+
+def query_attr_select(params):
+    """
+        Create portion of SELECT statement for attributes inolved in query.
+        @param params: dict of information used in query (column names,
+                       table name, etc.)
+    """
+
+    attrs = [k for k in params
+             if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')]
+
+    template = "i.\"{%(col)s}\"::numeric As attr%(alias_num)s, "
+
+    attr_string = ""
+
+    for idx, val in enumerate(sorted(attrs)):
+        attr_string += template % {"col": val, "alias_num": idx + 1}
+
+    return attr_string
+
+def query_attr_where(params):
+    """
+        Create portion of WHERE clauses for weeding out NULL-valued geometries
+    """
+    attrs = sorted([k for k in params
+                    if k not in ('id_col', 'geom_col', 'subquery', 'num_ngbrs')])
+
+    attr_string = []
+
+    for attr in attrs:
+        attr_string.append("idx_replace.\"{%s}\" IS NOT NULL" % attr)
+
+    if len(attrs) == 2:
+        attr_string.append("idx_replace.\"{%s}\" <> 0" % attrs[1])
+
+    out = " AND ".join(attr_string)
+
+    return out
+
+def knn(params):
+    """SQL query for k-nearest neighbors.
+        @param vars: dict of values to fill template
+    """
+
+    attr_select = query_attr_select(params)
+    attr_where = query_attr_where(params)
+
+    replacements = {"attr_select": attr_select,
+                    "attr_where_i": attr_where.replace("idx_replace", "i"),
+                    "attr_where_j": attr_where.replace("idx_replace", "j")}
+
+    query = "SELECT " \
+                "i.\"{id_col}\" As id, " \
+                "%(attr_select)s" \
+                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
+                              "FROM ({subquery}) As j " \
+                              "WHERE " \
+                                "i.\"{id_col}\" <> j.\"{id_col}\" AND " \
+                                "%(attr_where_j)s " \
+                              "ORDER BY " \
+                                "j.\"{geom_col}\" <-> i.\"{geom_col}\" ASC " \
+                              "LIMIT {num_ngbrs})" \
+                ") As neighbors " \
+            "FROM ({subquery}) As i " \
+            "WHERE " \
+                "%(attr_where_i)s " \
+            "ORDER BY i.\"{id_col}\" ASC;" % replacements
+
+    return query.format(**params)
+
+## SQL query for finding queens neighbors (all contiguous polygons)
+def queen(params):
+    """SQL query for queen neighbors.
+        @param params dict: information to fill query
+    """
+    attr_select = query_attr_select(params)
+    attr_where = query_attr_where(params)
+
+    replacements = {"attr_select": attr_select,
+                    "attr_where_i": attr_where.replace("idx_replace", "i"),
+                    "attr_where_j": attr_where.replace("idx_replace", "j")}
+
+    query = "SELECT " \
+                "i.\"{id_col}\" As id, " \
+                "%(attr_select)s" \
+                "(SELECT ARRAY(SELECT j.\"{id_col}\" " \
+                 "FROM ({subquery}) As j " \
+                 "WHERE i.\"{id_col}\" <> j.\"{id_col}\" AND " \
+                       "ST_Touches(i.\"{geom_col}\", j.\"{geom_col}\") AND " \
+                       "%(attr_where_j)s)" \
+                ") As neighbors " \
+            "FROM ({subquery}) As i " \
+            "WHERE " \
+                "%(attr_where_i)s " \
+            "ORDER BY i.\"{id_col}\" ASC;" % replacements
+
+    return query.format(**params)
+
+## to add more weight methods open a ticket or pull request
+
+def get_attributes(query_res, attr_num=1):
+    """
+        @param query_res: query results with attributes and neighbors
+        @param attr_num: attribute number (1, 2, ...)
+    """
+    return np.array([x['attr' + str(attr_num)] for x in query_res], dtype=np.float)
+
+def empty_zipped_array(num_nones):
+    """
+        prepare return values for cases of empty weights objects (no neighbors)
+        Input:
+        @param num_nones int: number of columns (e.g., 4)
+        Output:
+        [(None, None, None, None)]
+    """
+
+    return [tuple([None] * num_nones)]
--- a/src/py/crankshaft/crankshaft/random_seeds.py
+++ b/src/py/crankshaft/crankshaft/random_seeds.py
@@ -0,0 +1,10 @@
+import random
+import numpy
+
+def set_random_seeds(value):
+    """
+    Set the seeds of the RNGs (Random Number Generators)
+    used internally.
+    """
+    random.seed(value)
+    numpy.random.seed(value)
--- a/src/py/crankshaft/crankshaft/similarity/init.py
+++ b/src/py/crankshaft/crankshaft/similarity/init.py
@@ -0,0 +1 @@
+from similarity import * 
--- a/src/py/crankshaft/crankshaft/similarity/similarity.py
+++ b/src/py/crankshaft/crankshaft/similarity/similarity.py
@@ -0,0 +1,91 @@
+from sklearn.neighbors import NearestNeighbors
+import  scipy.stats as stats
+import numpy as np
+import plpy
+import time
+import cPickle
+
+
+def query_to_dictionary(result):
+    return [ dict(zip(r.keys(), r.values())) for r in result ]
+
+def drop_all_nan_columns(data):
+    return data[ :, ~np.isnan(data).all(axis=0)]
+    
+def fill_missing_na(data,val=None):
+    inds = np.where(np.isnan(data))
+    if val==None:
+        col_mean = stats.nanmean(data,axis=0)
+        data[inds]=np.take(col_mean,inds[1])
+    else:
+        data[inds]=np.take(val, inds[1])
+    return data
+    
+def similarity_rank(target_cartodb_id, query):
+    start_time  = time.time() 
+    #plpy.notice('converting to dictionary ', start_time) 
+    #data = query_to_dictionary(plpy.execute(query))  
+    plpy.notice('coverted , running query ', time.time() - start_time) 
+    
+    data = plpy.execute(query_only_values(query))
+    plpy.notice('run query  , getting cartodb_idsi', time.time() - start_time)
+    cartodb_ids = plpy.execute(query_cartodb_id(query))[0]['a']
+    target_id  = cartodb_ids.index(target_cartodb_id)
+    plpy.notice('run query  , extracting ', time.time() - start_time)
+    features, target = extract_features_target(data,target_id)
+    plpy.notice('extracted  , cleaning ', time.time() - start_time)
+    features = fill_missing_na(drop_all_nan_columns(features))
+    plpy.notice('cleaned , normalizing', start_time - time.time())
+    
+    normed_features, normed_target  = normalize_features(features,target)
+    plpy.notice('normalized , training ', time.time() - start_time )
+    tree = train(normed_features)
+    plpy.notice('normalized , pickling ', time.time() - start_time )
+    #plpy.notice('tree_dump ',  len(cPickle.dumps(tree, protocol=cPickle.HIGHEST_PROTOCOL)))
+    plpy.notice('pickles, querying ', time.time() - start_time)
+    dist, ind  = tree.kneighbors(normed_target)
+    plpy.notice('queried , rectifying', time.time() - start_time)
+    return zip(cartodb_ids, dist[0])
+
+def query_cartodb_id(query):
+    return 'select array_agg(cartodb_id) a from ({0}) b'.format(query)
+
+def query_only_values(query):
+    first_row = plpy.execute('select * from ({query}) a limit 1'.format(query=query))
+    just_values = ','.join([ key for key in  first_row[0].keys()  if key not in ['the_geom', 'the_geom_webmercator','cartodb_id']])
+    return 'select Array[{0}] a from ({1}) b '.format(just_values, query)
+
+
+def most_similar(matches,query):
+    data = plpy.execute(query)    
+    features, _ = extract_features_target(data)
+    results = []
+    for i in features:
+        target = features
+        dist,ind = tree.query(target, k=matches)
+        cartodb_ids  = [ dist[ind]['cartodb_id'] for index in ind ]
+        results.append(cartodb_ids)
+    return cartodb_ids, results
+    
+    
+def train(features):
+    tree = NearestNeighbors( n_neighbors=len(features), algorithm='auto').fit(features)
+    return tree
+    
+def normalize_features(features, target):
+    maxes = features.max(axis=0)
+    mins  = features.min(axis=0)
+    return (features - mins)/(maxes-mins), (target-mins)/(maxes-mins)
+ 
+def extract_row(row):
+    keys = row.keys()
+    values = row.values()
+    del values[ keys.index('cartodb_id')]
+    return values
+
+def extract_features_target(data, target_index=None):
+    target   = None
+    features = [row['a'] for row in data]
+    target   = features[target_index]
+    return np.array(features, dtype=float), np.array(target, dtype=float)
+    
--- a/src/py/crankshaft/setup.py
+++ b/src/py/crankshaft/setup.py
@@ -0,0 +1,48 @@
+
+"""
+CartoDB Spatial Analysis Python Library
+See:
+https://github.com/CartoDB/crankshaft
+"""
+
+from setuptools import setup, find_packages
+
+setup(
+    name='crankshaft',
+
+    version='0.0.0',
+
+    description='CartoDB Spatial Analysis Python Library',
+
+    url='https://github.com/CartoDB/crankshaft',
+
+    author='Data Services Team - CartoDB',
+    author_email='dataservices@cartodb.com',
+
+    license='MIT',
+
+    classifiers=[
+        'Development Status :: 3 - Alpha',
+        'Intended Audience :: Mapping comunity',
+        'Topic :: Maps :: Mapping Tools',
+        'License :: OSI Approved :: MIT License',
+        'Programming Language :: Python :: 2.7',
+    ],
+
+    keywords='maps mapping tools spatial analysis geostatistics',
+
+    packages=find_packages(exclude=['contrib', 'docs', 'tests']),
+
+    extras_require={
+        'dev': ['unittest'],
+        'test': ['unittest', 'nose', 'mock'],
+    },
+
+    # The choice of component versions is dictated by what's
+    # provisioned in the production servers.
+    install_requires=['pysal==1.9.1', 'scikit-learn==0.17.1'],
+
+    requires=['pysal', 'numpy','sklearn'],
+
+    test_suite='test'
+)
--- a/src/py/crankshaft/test/fixtures/moran.json
+++ b/src/py/crankshaft/test/fixtures/moran.json
@@ -0,0 +1,52 @@
+[[0.9319096128346788, "HH"],
+[-1.135787401862846, "HL"],
+[0.11732030672508517, "LL"],
+[0.6152779669180425, "LL"],
+[-0.14657336660125297, "LH"],
+[0.6967858120189607, "LL"],
+[0.07949310115714454, "HH"],
+[0.4703198759258987, "HH"],
+[0.4421125200498064, "HH"],
+[0.5724288737143592, "LL"],
+[0.8970743435692062, "LL"],
+[0.18327334401918674, "LL"],
+[-0.01466729201304962, "HL"],
+[0.3481559372544409, "LL"],
+[0.06547094736902978, "LL"],
+[0.15482141569329988, "HH"],
+[0.4373841193538136, "HH"],
+[0.15971286468915544, "LL"],
+[1.0543588860308968, "HH"],
+[1.7372866900020818, "HH"],
+[1.091998586053999, "LL"],
+[0.1171572584252222, "HH"],
+[0.08438455015300014, "LL"],
+[0.06547094736902978, "LL"],
+[0.15482141569329985, "HH"],
+[1.1627044812890683, "HH"],
+[0.06547094736902978, "LL"],
+[0.795275137550483, "HH"],
+[0.18562939195219, "LL"],
+[0.3010757406693439, "LL"],
+[2.8205795942839376, "HH"],
+[0.11259190602909264, "LL"],
+[-0.07116352791516614, "HL"],
+[-0.09945240794119009, "LH"],
+[0.18562939195219, "LL"],
+[0.1832733440191868, "LL"],
+[-0.39054253768447705, "HL"],
+[-0.1672071289487642, "HL"],
+[0.3337669247916343, "HH"],
+[0.2584386102554792, "HH"],
+[-0.19733845476322634, "HL"],
+[-0.9379282899805409, "LH"],
+[-0.028770969951095866, "LH"],
+[0.051367269430983485, "LL"],
+[-0.2172548045913472, "LH"],
+[0.05136726943098351, "LL"],
+[0.04191046803899837, "LL"],
+[0.7482357030403517, "HH"],
+[-0.014585767863118111, "LH"],
+[0.5410013139159929, "HH"],
+[1.0223932668429925, "LL"],
+[1.4179402898927476, "LL"]]
--- a/src/py/crankshaft/test/fixtures/neighbors.json
+++ b/src/py/crankshaft/test/fixtures/neighbors.json
@@ -0,0 +1,54 @@
+[
+    {"neighbors": [48, 26, 20, 9, 31], "id": 1, "value": 0.5},
+    {"neighbors": [30, 16, 46, 3, 4], "id": 2, "value": 0.7},
+    {"neighbors": [46, 30, 2, 12, 16], "id": 3, "value": 0.2},
+    {"neighbors": [18, 30, 23, 2, 52], "id": 4, "value": 0.1},
+    {"neighbors": [47, 40, 45, 37, 28], "id": 5, "value": 0.3},
+    {"neighbors": [10, 21, 41, 14, 37], "id": 6, "value": 0.05},
+    {"neighbors": [8, 17, 43, 25, 12], "id": 7, "value": 0.4},
+    {"neighbors": [17, 25, 43, 22, 7], "id": 8, "value": 0.7},
+    {"neighbors": [39, 34, 1, 26, 48], "id": 9, "value": 0.5},
+    {"neighbors": [6, 37, 5, 45, 49], "id": 10, "value": 0.04},
+    {"neighbors": [51, 41, 29, 21, 14], "id": 11, "value": 0.08},
+    {"neighbors": [44, 46, 43, 50, 3], "id": 12, "value": 0.2},
+    {"neighbors": [45, 23, 14, 28, 18], "id": 13, "value": 0.4},
+    {"neighbors": [41, 29, 13, 23, 6], "id": 14, "value": 0.2},
+    {"neighbors": [36, 27, 32, 33, 24], "id": 15, "value": 0.3},
+    {"neighbors": [19, 2, 46, 44, 28], "id": 16, "value": 0.4},
+    {"neighbors": [8, 25, 43, 7, 22], "id": 17, "value": 0.6},
+    {"neighbors": [23, 4, 29, 14, 13], "id": 18, "value": 0.3},
+    {"neighbors": [42, 16, 28, 26, 40], "id": 19, "value": 0.7},
+    {"neighbors": [1, 48, 31, 26, 42], "id": 20, "value": 0.8},
+    {"neighbors": [41, 6, 11, 14, 10], "id": 21, "value": 0.1},
+    {"neighbors": [25, 50, 43, 31, 44], "id": 22, "value": 0.4},
+    {"neighbors": [18, 13, 14, 4, 2], "id": 23, "value": 0.1},
+    {"neighbors": [33, 49, 34, 47, 27], "id": 24, "value": 0.3},
+    {"neighbors": [43, 8, 22, 17, 50], "id": 25, "value": 0.4},
+    {"neighbors": [1, 42, 20, 31, 48], "id": 26, "value": 0.6},
+    {"neighbors": [32, 15, 36, 33, 24], "id": 27, "value": 0.3},
+    {"neighbors": [40, 45, 19, 5, 13], "id": 28, "value": 0.8},
+    {"neighbors": [11, 51, 41, 14, 18], "id": 29, "value": 0.3},
+    {"neighbors": [2, 3, 4, 46, 18], "id": 30, "value": 0.1},
+    {"neighbors": [20, 26, 1, 50, 48], "id": 31, "value": 0.9},
+    {"neighbors": [27, 36, 15, 49, 24], "id": 32, "value": 0.3},
+    {"neighbors": [24, 27, 49, 34, 32], "id": 33, "value": 0.4},
+    {"neighbors": [47, 9, 39, 40, 24], "id": 34, "value": 0.3},
+    {"neighbors": [38, 51, 11, 21, 41], "id": 35, "value": 0.3},
+    {"neighbors": [15, 32, 27, 49, 33], "id": 36, "value": 0.2},
+    {"neighbors": [49, 10, 5, 47, 24], "id": 37, "value": 0.5},
+    {"neighbors": [35, 21, 51, 11, 41], "id": 38, "value": 0.4},
+    {"neighbors": [9, 34, 48, 1, 47], "id": 39, "value": 0.6},
+    {"neighbors": [28, 47, 5, 9, 34], "id": 40, "value": 0.5},
+    {"neighbors": [11, 14, 29, 21, 6], "id": 41, "value": 0.4},
+    {"neighbors": [26, 19, 1, 9, 31], "id": 42, "value": 0.2},
+    {"neighbors": [25, 12, 8, 22, 44], "id": 43, "value": 0.3},
+    {"neighbors": [12, 50, 46, 16, 43], "id": 44, "value": 0.2},
+    {"neighbors": [28, 13, 5, 40, 19], "id": 45, "value": 0.3},
+    {"neighbors": [3, 12, 44, 2, 16], "id": 46, "value": 0.2},
+    {"neighbors": [34, 40, 5, 49, 24], "id": 47, "value": 0.3},
+    {"neighbors": [1, 20, 26, 9, 39], "id": 48, "value": 0.5},
+    {"neighbors": [24, 37, 47, 5, 33], "id": 49, "value": 0.2},
+    {"neighbors": [44, 22, 31, 42, 26], "id": 50, "value": 0.6},
+    {"neighbors": [11, 29, 41, 14, 21], "id": 51, "value": 0.01},
+    {"neighbors": [4, 18, 29, 51, 23], "id": 52, "value": 0.01}
+  ]
--- a/src/py/crankshaft/test/helper.py
+++ b/src/py/crankshaft/test/helper.py
@@ -0,0 +1,13 @@
+import unittest
+
+from mock_plpy import MockPlPy
+plpy = MockPlPy()
+
+import sys
+sys.modules['plpy'] = plpy
+
+import os
+
+def fixture_file(name):
+    dir = os.path.dirname(os.path.realpath(__file__))
+    return os.path.join(dir, 'fixtures', name)
--- a/src/py/crankshaft/test/mock_plpy.py
+++ b/src/py/crankshaft/test/mock_plpy.py
@@ -0,0 +1,34 @@
+import re
+
+class MockPlPy:
+    def __init__(self):
+        self._reset()
+
+    def _reset(self):
+        self.infos = []
+        self.notices = []
+        self.debugs = []
+        self.logs = []
+        self.warnings = []
+        self.errors = []
+        self.fatals = []
+        self.executes = []
+        self.results = []
+        self.prepares = []
+        self.results = []
+
+    def _define_result(self, query, result):
+        pattern = re.compile(query, re.IGNORECASE | re.MULTILINE)
+        self.results.append([pattern, result])
+
+    def notice(self, msg):
+        self.notices.append(msg)
+
+    def info(self, msg):
+        self.infos.append(msg)
+
+    def execute(self, query): # TODO: additional arguments
+       for result in self.results:
+          if result[0].match(query):
+            return result[1]
+       return []
--- a/src/py/crankshaft/test/test_clustering_moran.py
+++ b/src/py/crankshaft/test/test_clustering_moran.py
@@ -0,0 +1,83 @@
+import unittest
+import numpy as np
+
+
+# from mock_plpy import MockPlPy
+# plpy = MockPlPy()
+#
+# import sys
+# sys.modules['plpy'] = plpy
+from helper import plpy, fixture_file
+
+import crankshaft.clustering as cc
+import crankshaft.pysal_utils as pu
+from crankshaft import random_seeds
+import json
+
+class MoranTest(unittest.TestCase):
+    """Testing class for Moran's I functions"""
+
+    def setUp(self):
+        plpy._reset()
+        self.params = {"id_col": "cartodb_id",
+                       "attr1": "andy",
+                       "attr2": "jay_z",
+                       "subquery": "SELECT * FROM a_list",
+                       "geom_col": "the_geom",
+                       "num_ngbrs": 321}
+        self.neighbors_data = json.loads(open(fixture_file('neighbors.json')).read())
+        self.moran_data = json.loads(open(fixture_file('moran.json')).read())
+
+    def test_map_quads(self):
+        """Test map_quads"""
+        self.assertEqual(cc.map_quads(1), 'HH')
+        self.assertEqual(cc.map_quads(2), 'LH')
+        self.assertEqual(cc.map_quads(3), 'LL')
+        self.assertEqual(cc.map_quads(4), 'HL')
+        self.assertEqual(cc.map_quads(33), None)
+        self.assertEqual(cc.map_quads('andy'), None)
+
+    def test_quad_position(self):
+        """Test lisa_sig_vals"""
+
+        quads = np.array([1, 2, 3, 4], np.int)
+
+        ans = np.array(['HH', 'LH', 'LL', 'HL'])
+        test_ans = cc.quad_position(quads)
+
+        self.assertTrue((test_ans == ans).all())
+
+    def test_moran_local(self):
+        """Test Moran's I local"""
+        data = [ { 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
+        plpy._define_result('select', data)
+        random_seeds.set_random_seeds(1234)
+        result = cc.moran_local('subquery', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
+        result = [(row[0], row[1]) for row in result]
+        expected = self.moran_data
+        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
+            self.assertAlmostEqual(res_val, exp_val)
+            self.assertEqual(res_quad, exp_quad)
+
+    def test_moran_local_rate(self):
+        """Test Moran's I rate"""
+        data = [ { 'id': d['id'], 'attr1': d['value'], 'attr2': 1, 'neighbors': d['neighbors'] } for d in self.neighbors_data]
+        plpy._define_result('select', data)
+        random_seeds.set_random_seeds(1234)
+        result = cc.moran_local_rate('subquery', 'numerator', 'denominator', 99, 'the_geom', 'cartodb_id', 'knn', 5)
+        print 'result == None? ', result == None
+        result = [(row[0], row[1]) for row in result]
+        expected = self.moran_data
+        for ([res_val, res_quad], [exp_val, exp_quad]) in zip(result, expected):
+            self.assertAlmostEqual(res_val, exp_val)
+
+    def test_moran(self):
+        """Test Moran's I global"""
+        data = [{ 'id': d['id'], 'attr1': d['value'], 'neighbors': d['neighbors'] } for d in self.neighbors_data]
+        plpy._define_result('select', data)
+        random_seeds.set_random_seeds(1235)
+        result = cc.moran('table', 'value', 99, 'the_geom', 'cartodb_id', 'knn', 5)
+        print 'result == None?', result == None
+        result_moran = result[0][0]
+        expected_moran = np.array([row[0] for row in self.moran_data]).mean()
+        self.assertAlmostEqual(expected_moran, result_moran, delta=10e-2)
--- a/src/py/crankshaft/test/test_pysal_utils.py
+++ b/src/py/crankshaft/test/test_pysal_utils.py
@@ -0,0 +1,107 @@
+import unittest
+
+import crankshaft.pysal_utils as pu
+from crankshaft import random_seeds
+
+
+class PysalUtilsTest(unittest.TestCase):
+    """Testing class for utility functions related to PySAL integrations"""
+
+    def setUp(self):
+        self.params = {"id_col": "cartodb_id",
+                       "attr1": "andy",
+                       "attr2": "jay_z",
+                       "subquery": "SELECT * FROM a_list",
+                       "geom_col": "the_geom",
+                       "num_ngbrs": 321}
+
+    def test_query_attr_select(self):
+        """Test query_attr_select"""
+
+        ans = "i.\"{attr1}\"::numeric As attr1, " \
+              "i.\"{attr2}\"::numeric As attr2, "
+
+        self.assertEqual(pu.query_attr_select(self.params), ans)
+
+    def test_query_attr_where(self):
+        """Test pu.query_attr_where"""
+
+        ans = "idx_replace.\"{attr1}\" IS NOT NULL AND " \
+              "idx_replace.\"{attr2}\" IS NOT NULL AND " \
+              "idx_replace.\"{attr2}\" <> 0"
+
+        self.assertEqual(pu.query_attr_where(self.params), ans)
+
+    def test_knn(self):
+        """Test knn neighbors constructor"""
+
+        ans = "SELECT i.\"cartodb_id\" As id, " \
+                     "i.\"andy\"::numeric As attr1, " \
+                     "i.\"jay_z\"::numeric As attr2, " \
+                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
+                                   "FROM (SELECT * FROM a_list) As j " \
+                                   "WHERE " \
+                                    "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
+                                    "j.\"andy\" IS NOT NULL AND " \
+                                    "j.\"jay_z\" IS NOT NULL AND " \
+                                    "j.\"jay_z\" <> 0 " \
+                                   "ORDER BY " \
+                                    "j.\"the_geom\" <-> i.\"the_geom\" ASC " \
+                      "LIMIT 321)) As neighbors " \
+              "FROM (SELECT * FROM a_list) As i " \
+              "WHERE i.\"andy\" IS NOT NULL AND " \
+                    "i.\"jay_z\" IS NOT NULL AND " \
+                    "i.\"jay_z\" <> 0 " \
+              "ORDER BY i.\"cartodb_id\" ASC;"
+
+        self.assertEqual(pu.knn(self.params), ans)
+
+    def test_queen(self):
+        """Test queen neighbors constructor"""
+
+        ans = "SELECT i.\"cartodb_id\" As id, " \
+                     "i.\"andy\"::numeric As attr1, " \
+                     "i.\"jay_z\"::numeric As attr2, " \
+                     "(SELECT ARRAY(SELECT j.\"cartodb_id\" " \
+                                   "FROM (SELECT * FROM a_list) As j " \
+                                   "WHERE " \
+                                   "i.\"cartodb_id\" <> j.\"cartodb_id\" AND " \
+                                   "ST_Touches(i.\"the_geom\", " \
+                                              "j.\"the_geom\") AND " \
+                                   "j.\"andy\" IS NOT NULL AND " \
+                                   "j.\"jay_z\" IS NOT NULL AND " \
+                                   "j.\"jay_z\" <> 0)" \
+                                  ") As neighbors " \
+              "FROM (SELECT * FROM a_list) As i " \
+              "WHERE i.\"andy\" IS NOT NULL AND " \
+                    "i.\"jay_z\" IS NOT NULL AND " \
+                    "i.\"jay_z\" <> 0 " \
+              "ORDER BY i.\"cartodb_id\" ASC;"
+
+        self.assertEqual(pu.queen(self.params), ans)
+
+    def test_construct_neighbor_query(self):
+        """Test construct_neighbor_query"""
+
+        # Compare to raw knn query
+        self.assertEqual(pu.construct_neighbor_query('knn', self.params),
+                         pu.knn(self.params))
+
+    def test_get_attributes(self):
+        """Test get_attributes"""
+
+        ## need to add tests
+
+        self.assertEqual(True, True)
+
+    def test_get_weight(self):
+        """Test get_weight"""
+
+        self.assertEqual(True, True)
+
+    def test_empty_zipped_array(self):
+        """Test empty_zipped_array"""
+        ans2 = [(None, None)]
+        ans4 = [(None, None, None, None)]
+        self.assertEqual(pu.empty_zipped_array(2), ans2)
+        self.assertEqual(pu.empty_zipped_array(4), ans4)
Author	SHA1	Message	Date
Ubuntu	97b4949f84	performance imporvments	2016-05-27 19:31:37 +00:00
Ubuntu	df09d03de6	adding sklearn to deps	2016-05-27 14:59:24 +00:00
Ubuntu	b3c55614e3	fixing syntax	2016-05-27 14:58:43 +00:00
Ubuntu	1ddc338f3f	adding missing ;	2016-05-27 14:58:05 +00:00
Stuart Lynn	d7424b02e5	adding import to crankshaft __init__	2016-05-27 10:33:00 -04:00
Stuart Lynn	45705f3a16	adding function preflight	2016-05-27 10:29:47 -04:00
Stuart Lynn	1995721921	adding functions to drop columns which are all nan and fill nan values with the mean of those columns	2016-05-27 10:29:15 -04:00
Ubuntu	4630d6b549	debugging	2016-05-26 19:32:49 +00:00
Stuart Lynn	0fca6c3c1a	inital commit of similarity functions	2016-05-26 12:31:58 -04:00
Andy Eschbacher	fe22464b75	Merge pull request #22 from CartoDB/update-docs Update docs format	2016-05-23 09:51:44 -04:00
Javier Goizueta	cc4a35ebd9	Fix instructions to update/install the extension	2016-05-20 11:47:12 +02:00
Andy Eschbacher	633b63bccc	Merge pull request #25 from CartoDB/improve-moran-queries-revisited adding condition to avoid self-comparison in neighbor queries	2016-03-30 15:40:29 -04:00
Andy Eschbacher	ea02f36235	adding condition to avoid self-comparison in neighbor queries	2016-03-30 15:37:51 -04:00
Andy Eschbacher	22b6aed7c1	Merge pull request #16 from CartoDB/proof-read-and-gitignore-update Proof read and gitignore update	2016-03-30 12:37:29 -04:00
Andy Eschbacher	f6e8524669	Merge pull request #19 from CartoDB/restructure-moran-redux Restructure moran redux	2016-03-30 12:10:36 -04:00
Andy Eschbacher	02b74813ac	add test for global moran	2016-03-30 12:09:49 -04:00
Andy Eschbacher	4c243bf1d3	correct func signatures	2016-03-30 11:44:44 -04:00
Andy Eschbacher	b0150d4fec	adding tests for pysal_utils	2016-03-30 08:27:14 -04:00
Andy Eschbacher	6bb4f36df5	extracting util code to new submodule	2016-03-30 08:10:35 -04:00
Andy Eschbacher	5a46f65e59	update tests to remove plpy notices	2016-03-30 08:09:48 -04:00
Andy Eschbacher	e56519f599	removed unneded comments, make outputs more consistent	2016-03-29 23:39:29 -07:00
Andy Eschbacher	8dd8ab37a5	refactored from pylint	2016-03-29 22:49:31 -07:00
Andy Eschbacher	06f5cf9951	standarizing error reporting	2016-03-29 12:34:23 -07:00
Andy Eschbacher	bc67ae8f69	changed name of functions for observatory	2016-03-29 12:18:52 -07:00
Andy Eschbacher	00579cd838	adding template	2016-03-23 17:10:08 -04:00
Andy Eschbacher	3f20275d3d	adopting new format (wip)	2016-03-23 17:09:52 -04:00
Andy Eschbacher	eecbe39547	updating tests	2016-03-22 10:42:44 -04:00
Andy Eschbacher	1578b17eb8	updated function flow without significance	2016-03-22 10:42:06 -04:00
Andy Eschbacher	3eda8ecd16	new signatures for moran (w/o significance)	2016-03-22 10:34:22 -04:00
Andy Eschbacher	0aa4d0a50e	typo fixes, linking, etc.	2016-03-21 08:51:10 -04:00
Andy Eschbacher	3b31da783a	adding mac ds_store ignore	2016-03-21 08:40:37 -04:00
Javier Goizueta	8762f6ca1c	Merge pull request #12 from CartoDB/feat-moran-free-queries Allow to pass free queries as `select * from table limit 100` in moran	2016-03-16 19:43:15 +01:00
Raul Ochoa	58c141d217	Allow to pass free queries as `select * from table limit 100` in moran	2016-03-16 19:40:06 +01:00
Javier Goizueta	5a7d3178dd	Release 0.0.2 This version is the first with the new versioning approach which uses separate per-version Pyhton virtual enironments.	2016-03-16 19:22:21 +01:00
Javier Goizueta	4903af6cdc	Add existing release 0.0.1 The existing 0.0.1 files are placed into their location in release/	2016-03-16 18:41:49 +01:00
Javier Goizueta	692014d694	Merge pull request #11 from CartoDB/new-versioning-package-varenv New versioning process (with multiple virtual environments)	2016-03-16 18:21:52 +01:00
Javier Goizueta	47e0253652	Fixes to the documentation	2016-03-16 18:18:59 +01:00
Javier Goizueta	9f03a9b075	Reorganize the documentation into separate files Keep a "Quickstart Guide" in the README, add separate detailed sections for development (CONTRIBUTING) and release/deployment (RELEASE).	2016-03-16 17:42:28 +01:00
Javier Goizueta	b5281d0681	Documentation clarifications and corrections.	2016-03-16 17:19:21 +01:00
Javier Goizueta	689ec8a925	Change version function from IMMUTABLE to STABLE These functions' results will change when the extension is updated.	2016-03-16 17:09:50 +01:00
Javier Goizueta	a7e42e93cc	Rename cdb_crankshaft_internal_version as internal function	2016-03-16 16:41:54 +01:00
Javier Goizueta	bad09ffd7b	Remove abandoned alternatives from the documentation	2016-03-16 16:30:03 +01:00
Javier Goizueta	4706442a1d	Add documentation about useful make targets	2016-03-16 15:56:19 +01:00
Javier Goizueta	935c7f9963	Add missing Makefile comment	2016-03-16 15:54:39 +01:00
Javier Goizueta	ef3bcaeee8	Restore commented-out make target	2016-03-16 15:52:47 +01:00
Javier Goizueta	4ffb2c9664	Review and fix the documentation	2016-03-16 15:45:13 +01:00
Javier Goizueta	dea6e2f1a7	Refactor the Makefile Separate concerns properly for each subdirectory's Makefile	2016-03-16 15:40:40 +01:00
Javier Goizueta	d13f167d47	Add RELEASE_VERSION option to make deploy Now make deploy installs by default the current version, but can be made to install any prior specific version using a environmnt varialbe RELEASE_VERSION	2016-03-16 14:38:18 +01:00
Javier Goizueta	a518034e65	Fix .pyc files need not only be ignored inside src/py	2016-03-16 11:13:26 +01:00
Javier Goizueta	24e4037995	Fix version number of released extension script	2016-03-16 11:11:16 +01:00
Javier Goizueta	82a738fe40	Fix make clean tasks	2016-03-16 10:18:07 +01:00
Javier Goizueta	e801c9cb60	Release tasks using release-specific virtual environments Refine the development process and define the procedure for releasing new versions.	2016-03-15 18:48:46 +01:00
Javier Goizueta	0206cc6c44	Update documentation	2016-03-10 19:13:46 +01:00
Rafa de la Torre	b754ffe42a	Add info about python dependencies	2016-03-10 18:06:21 +01:00
Javier Goizueta	0056f411b5	Set the path to virtualenvs in the Makefile Also, version the virtualenv	2016-03-09 19:04:21 +01:00
Javier Goizueta	1810f02242	Use SciPy from system package python-scipy	2016-03-09 15:03:17 +01:00
Javier Goizueta	8e972128eb	Modify sql code to user the python virtualenv	2016-03-09 15:00:50 +01:00
Javier Goizueta	cdd2d9e722	Directory reorganization and sketch of new versioning procedure	2016-03-08 19:35:02 +01:00