39 Commits
1.5.0 ... 1.8.0

Author SHA1 Message Date
Mario de Frutos
ff0989f8fc Merge pull request #314 from CartoDB/develop
Release 1.8.0
2017-10-18 10:16:55 +02:00
Mario de Frutos
0a753e95c0 Release 1.8.0 artifacts 2017-10-18 10:09:25 +02:00
Mario de Frutos
b62e3ea963 Merge pull request #313 from CartoDB/add_numgeoms_getavailablegeometries
OBS_GetAvailableGeometries now receives number of geometries from input
2017-10-18 10:05:46 +02:00
Mario de Frutos
1da0b8cb6b Update doc with new field 2017-10-18 10:00:15 +02:00
csobier
a39de46531 docs fixed links
@inigomedina , just a docs url fix. Need to merge to fix live docs. Thanks!
2017-10-13 08:14:06 -04:00
Mario de Frutos
94b8e7492d OBS_GetAvailableGeometries now receives number of geometries from input
We need the number of geometries to pass them to the get score function
in order to get an accurate score for the input in order to suggest
what is the geometry that fits better for the input we have
2017-10-10 18:09:26 +02:00
Antonio Carlón
91ece26c06 Merge pull request #311 from CartoDB/remove_wof_tests
Remove WOF perftests.
2017-09-25 17:00:58 +02:00
Javier Torres
74b9d209c0 Use precise for travis tests, cartodb ppas don't have trusty anymore 2017-09-21 16:16:13 +02:00
Javier Torres
4ae889dfdc Remove WOF perftests. This is needed for tests to pass since we don't have WOF in our current dump 2017-09-21 10:36:05 +02:00
Mario de Frutos
3353ad0a32 Update NEWS.md 2017-08-18 16:43:39 +02:00
Mario de Frutos
b4ef3c77a9 Merge pull request #306 from CartoDB/develop
Release 1.7.0
2017-08-18 16:41:21 +02:00
Mario de Frutos
90a2421b6e Merge pull request #305 from CartoDB/obs_metadatavalidation_doc
OBS_MetadataValidation doc
2017-08-18 16:35:26 +02:00
Mario de Frutos
fd21709ca1 Fix missing schema for FIRST function 2017-08-18 16:20:13 +02:00
Javier Torres
3791511d7d Merge pull request #308 from CartoDB/307-TestsForDifferentPointsFixed
307 tests for different points fixed
2017-08-18 15:13:06 +02:00
Antonio
fad541c3fc Fixed broken tests and refactor 2017-08-18 11:19:15 +02:00
Antonio
48ed086fec Fixed tests for different test points per numerator 2017-08-17 16:31:40 +02:00
csobier
7e550cf909 applied quick copyedit to new docs code added 2017-08-11 08:00:00 -04:00
Mario de Frutos
6ab17bf8be New version 1.7.0 artifacts 2017-08-10 14:19:37 +02:00
Mario de Frutos
1f7f8015ad OBS_MetadataValidation doc 2017-08-10 13:29:42 +02:00
Mario de Frutos
6066ef028d Merge pull request #303 from CartoDB/precheck_metadata
OBS_MetadataValidation
2017-08-09 17:45:24 +02:00
Mario de Frutos
3c2e997a85 Add travis support to execute the tests 2017-08-09 17:16:47 +02:00
Mario de Frutos
cef99c6343 OBS_MetadataValidation
New function to check the metadata in order to search for errors like
for example if we have the metadata for a median aggregation and the
normalization is by are it'll fail.
2017-08-09 16:11:10 +02:00
Mario de Frutos
50d975ce9b Generate new fixtures to include new meta table
- Include the obs_meta_geom_numer_timespan table
2017-08-09 16:10:39 +02:00
Javier Torres
c56633dd2a Format NEWS.md 2017-07-31 10:15:00 +02:00
Michelle Ho
2b26c5ad64 fixing parentheses for obs_getdata with ids 2017-07-24 13:13:22 -04:00
Mario de Frutos
3ed18ca1f0 Merge pull request #301 from CartoDB/develop
Release 1.6.0
2017-07-20 12:49:30 +02:00
Mario de Frutos
028c93170c Updated NEWS with version 1.6.0 2017-07-20 10:56:00 +02:00
Mario de Frutos
8d52857f01 Version 1.6.0 release artifact 2017-07-20 10:50:32 +02:00
Mario de Frutos
9e36e11bb3 Merge pull request #302 from CartoDB/297_filter_geometries_by_numer_timespan
Modified OBS_GetAvailableGeometries
2017-07-12 13:20:21 +02:00
Mario de Frutos
adae37631e Modified OBS_GetAvailableGeometries
Now use the new meta ttable obs_meta_geom_numer_timspan to filter
the geometries by geometries timepsan and/or numerator timespan (which
is what we get when we use the obs_getavailabletimepspans)
2017-07-11 16:06:11 +02:00
Mario de Frutos
8b98b6b64a Bump version 1.6.0 2017-06-29 17:54:11 +02:00
Mario de Frutos
aedc45f2a8 Merge pull request #300 from CartoDB/4967_new_numerators_function
New private function _OBS_GetNumerators to be used in our UI
2017-06-29 17:51:50 +02:00
Mario de Frutos
8612da57f7 New private function _OBS_GetNumerators to be used in our UI
The current OBS_GetAvailableNumerators is not designed with our
UI in mind so it's causing a lot of troubles and we're doing so
many hacks to fit our UI needs and the interface of the function so this
function it's a better fit for our purposes.

This function is private because, by now, we don't want to expose
as a public function because could suffer changes in the near future.
2017-06-29 16:04:11 +02:00
Mario de Frutos
24a736c72e Tests for the PR #298 2017-06-29 13:33:07 +02:00
Mario de Frutos
cde6d5bfba Merge pull request #298 from CartoDB/4963_fix_multimeasure_null_for_all
Return NULL for the affected value and not for all the measurements
2017-06-29 12:55:22 +02:00
Mario de Frutos
d1f4e570ad Return NULL for the affected value and not for all the measurements
Right now we're doing INNER JOINS when we JOIN the _procgeoms and
the data so we end up with NULL value instead of id, NULL value. We need
to have the id available to make the JOIN at the end of the query and
provide results like this:

id |                               data
----+------------------------------------------------------------------
  1 | [{"value" : 858469},{"value" : 73.9397964478},{"value" : 69092}]
  2 | [{"value" : 738774},{"value" : null},{"value" : 2235406}]
2017-06-29 10:37:07 +02:00
John Krauss
415a4ccc05 update NEWS for 1.5.1 2017-05-16 14:33:02 +00:00
John Krauss
ccb8092506 1.5.1 release artifact 2017-05-16 14:27:49 +00:00
John Krauss
6266262427 new code to handle mixed geometries more quickly 2017-05-10 20:24:21 +00:00
23 changed files with 184287 additions and 2745 deletions

43
.travis.yml Normal file
View File

@@ -0,0 +1,43 @@
language: c
dist: precise
env:
global:
- PAGER=cat
before_install:
- sudo add-apt-repository -y ppa:cartodb/postgresql-9.5
- sudo add-apt-repository -y ppa:cartodb/gis
- sudo add-apt-repository -y ppa:cartodb/gis-testing
- sudo apt-get update
# Install postgres db and build deps
- sudo /etc/init.d/postgresql stop # stop travis default instance
- sudo apt-get -y remove --purge postgresql-9.1
- sudo apt-get -y remove --purge postgresql-9.2
- sudo apt-get -y remove --purge postgresql-9.3
- sudo apt-get -y remove --purge postgresql-9.4
- sudo apt-get -y remove --purge postgresql-9.5
- sudo rm -rf /var/lib/postgresql/
- sudo rm -rf /var/log/postgresql/
- sudo rm -rf /etc/postgresql/
- sudo apt-get -y remove --purge postgis-2.2
- sudo apt-get -y autoremove
- sudo apt-get -y install postgresql-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-server-dev-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-plpython-9.5=9.5.2-3cdb3
- sudo apt-get -y install postgresql-9.5-postgis-scripts=2.2.2.0-cdb2
- sudo apt-get -y install postgresql-9.5-postgis-2.2=2.2.2.0-cdb2
# configure it to accept local connections from postgres
- echo -e "# TYPE DATABASE USER ADDRESS METHOD \nlocal all postgres trust\nlocal all all trust\nhost all all 127.0.0.1/32 trust" \
| sudo tee /etc/postgresql/9.5/main/pg_hba.conf
- sudo /etc/init.d/postgresql restart 9.5
install:
- sudo make install
script:
- cd src/pg
- make test || { cat src/pg/test/regression.diffs; false; }

77
NEWS.md
View File

@@ -1,4 +1,57 @@
1.8.0 (2017-10-18)
------------------
__Improvements__
* Add `number_geometries` field to `OBS_GetAvailableGeometries` in order to provide the number of geometries from the source data to be used in the score calculation ([#313](https://github.com/CartoDB/observatory-extension/issues/313))
1.7.0 (2017-08-18)
------------------
__Improvements__
* Add Travis support to execute the extension tests ([#183](https://github.com/CartoDB/observatory-extension/issues/183))
__API Changes__
* Add new function `OBS_MetadataValidation` ([#303](https://github.com/CartoDB/observatory-extension/pull/303))
__Bugfixes__
* Fixed parentheses for obs_getdata with ids
* Fixed failing tests due changes in the data dump for some TIGER geometries
1.6.0 (2017-07-20)
------------------
__Improvements__
* The current OBS_GetAvailableNumerators is not designed with our
UI in mind so it's causing a lot of troubles and we're doing so
many hacks to fit our UI needs and the interface of the function so this
function it's a better fit for our purposes. ([#300](https://github.com/CartoDB/observatory-extension/pull/300))
* Now use the new meta table `obs_meta_geom_numer_timespan` to filter
the geometries by geometries timespan and/or numerator timespan (which
is what we get when we use the obs_getavailabletimespans) ([#302](https://github.com/CartoDB/observatory-extension/pull/302))
__Bugfixes__
* Right now we're doing INNER JOINS when we JOIN the `_procgeoms` and
the data so we end up with NULL value instead of id, NULL value. ([#298](https://github.com/CartoDB/observatory-extension/pull/298))
1.5.1 (2017-05-16)
------------------
__Improvements__
* Much improved performance for `OBS_GetData` when augmenting with several
different geometries simultaneously ([#285](https://github.com/CartoDB/observatory-extension/pull/285))
* Return the automatically assigned normalization type from `OBS_GetMeta`
([#285](https://github.com/CartoDB/observatory-extension/pull/285))
1.5.0 (2017-04-24)
------------------
__API Changes__
@@ -12,6 +65,7 @@ __API Changes__
([#282](https://github.com/CartoDB/observatory-extension/pull/282))
1.4.0 (2017-03-21)
------------------
__API Changes__
@@ -32,16 +86,19 @@ __Improvements__
boundary selection
1.3.5 (2017-03-15)
------------------
No changes. Artifact to allow for data update.
1.3.4 (2017-03-10)
------------------
__Bugfixes__
* Remove erroneously committed `RAISE NOTICE` in `OBS_GetData`
1.3.3 (2017-03-10)
------------------
__Bugfixes__
@@ -64,6 +121,7 @@ __Improvements__
([#267](https://github.com/CartoDB/observatory-extension/pull/267))
1.3.2 (2017-03-02)
------------------
__Bugfixes__
@@ -71,6 +129,7 @@ __Bugfixes__
This fixes issues with Camshaft.
1.3.1 (2017-02-16)
------------------
__Improvements__
@@ -81,6 +140,7 @@ __Improvements__
called for measures for polygons
1.3.0 (2017-01-17)
------------------
__API Changes__
@@ -105,9 +165,8 @@ __Bugfixes__
* Remove unnecessary dependency on `postgres_fdw`
* `OBS_GetData()` now aggregates measures with mixed geoms correctly
__API Changes__
1.2.1 (2017-01-17)
------------------
__Improvements__
@@ -115,6 +174,7 @@ __Improvements__
([#243](https://github.com/CartoDB/observatory-extension/pull/233))
1.2.0 (2016-12-28)
------------------
__API Changes__
@@ -135,6 +195,7 @@ __Improvements__
* Return both `table_id` and `column_id` from `_OBS_GetGeometryScores`
1.1.7 (2016-12-15)
------------------
__Improvements__
@@ -147,6 +208,7 @@ __Improvements__
* Yields a ~50% improvement in performance for `_OBSGetGeomeryScores`.
1.1.6 (2016-12-08)
------------------
__Bugfixes__
@@ -173,6 +235,7 @@ __Improvements__
- Add ability to persist results to JSON for graph visualization later
1.1.5 (2016-11-29)
------------------
__Bugfixes__
@@ -180,6 +243,7 @@ __Bugfixes__
a geometry where it does not exist ([#220](https://github.com/CartoDB/observatory-extension/issues/220)).
1.1.4 (2016-11-21)
------------------
__Bugfixes__
@@ -187,10 +251,12 @@ __Bugfixes__
`OBS_GetLegacyMetadata` ([#216](https://github.com/CartoDB/observatory-extension/issues/216)).
1.1.3 (2016-11-15)
------------------
* Temporarily ignore EU data for the sake of testing
1.1.2 (2016-11-09)
------------------
__Improvements__
@@ -206,12 +272,14 @@ __API Changes (Internal)__
* Add internal `_OBS_GetGeometryScores`
1.1.1 (2016-10-14)
------------------
__Improvements__
* Test points for Canada and France ([#204](https://github.com/CartoDB/observatory-extension/issues/120))
1.1.0 (2016-10-04)
------------------
__Bugfixes__
@@ -234,6 +302,7 @@ __API Changes__
is also referred to here ([CartoDB/design#68](https://github.com/CartoDB/design/issues/68)).
1.0.7 (2016-09-20)
------------------
__Bugfixes__
@@ -245,6 +314,7 @@ __Improvements__
* Automatic tests work for Canada and Thailand
1.0.6 (2016-09-08)
------------------
__Improvements__
@@ -252,6 +322,7 @@ __Improvements__
framework logic from the observatory measure functions.
1.0.5 (2016-08-12)
------------------
__Improvements__
@@ -259,6 +330,7 @@ __Improvements__
any HTTP SQL API.
1.0.4 (2016-07-26)
------------------
__Bugfixes__
@@ -267,6 +339,7 @@ __Bugfixes__
([#173](https://github.com/CartoDB/observatory-extension/issues/173))
1.0.3 (2016-07-25)
------------------
__Bugfixes__

View File

@@ -2,7 +2,7 @@
Use the following functions to retrieve [Boundary](https://carto.com/docs/carto-engine/data/overview/#boundary-data) data. Data ranges from small areas (e.g. US Census Block Groups) to large areas (e.g. Countries). You can access boundaries by point location lookup, bounding box lookup, direct ID access and several other methods described below.
You can [access](https://carto.com/docs/carto-engine/data/accessing) boundaries through CARTO Builder. The same methods will work if you are using the CARTO Engine to develop your application. We [encourage you](http://docs/carto-engine/data/accessing/#best-practices) to use table modifying methods (UPDATE and INSERT) over dynamic methods (SELECT).
You can [access](https://carto.com/docs/carto-engine/data/accessing) boundaries through CARTO Builder. The same methods will work if you are using the CARTO Engine to develop your application. We [encourage you](https://carto.com/docs/carto-engine/data/accessing/#best-practices) to use table modifying methods (UPDATE and INSERT) over dynamic methods (SELECT).
## OBS_GetBoundariesByGeometry(geom geometry, geometry_id text)
@@ -123,7 +123,7 @@ SET the_geom = OBS_GetBoundary(the_geom, 'us.census.tiger.block_group')
## OBS_GetBoundaryId(point_geometry, boundary_id)
The ```OBS_GetBoundaryId(point_geometry, boundary_id)``` returns a unique geometry_id for the boundary geometry that contains a given point geometry. See the [Boundary ID Glossary](http://docs/carto-engine/data/glossary/#boundary-ids). The method can be combined with ```OBS_GetBoundaryById(geometry_id)``` to create a point aggregation workflow.
The ```OBS_GetBoundaryId(point_geometry, boundary_id)``` returns a unique geometry_id for the boundary geometry that contains a given point geometry. See the [Boundary ID Glossary](https://carto.com/docs/carto-engine/data/glossary/#boundary-ids). The method can be combined with ```OBS_GetBoundaryById(geometry_id)``` to create a point aggregation workflow.
#### Arguments

View File

@@ -228,7 +228,7 @@ SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
WHERE valid_timespan IS True;
```
## OBS_GetAvailableGeometries(bounds, filter_tags, numer_id, denom_id, timespan)
## OBS_GetAvailableGeometries(bounds, filter_tags, numer_id, denom_id, timespan, number_geometries)
Return available geometries within a boundary and with the specified
`filter_tags`.
@@ -242,6 +242,7 @@ filter_tags | Text[] | a list of filters. Only geometries for which all of thes
numer_id | Text | the ID of a numerator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional)
denom_id | Text | the ID of a denominator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional)
timespan | Text | the ID of a timespan to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional)
number_geometries | Integer | Number of geometries of the source data in order to calculate more accurately the score value to know which geometry fits better with the provided extent. (optional)
#### Returns

View File

@@ -108,7 +108,7 @@ The ```OBS_GetMeasure(polygon, measure_id)``` function returns any Data Observat
Name |Description
--- | ---
polygon_geometry | a WGS84 polygon geometry (the_geom)
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf))
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf))
normalize | for measures that are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional)
boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract')
time_span | time span of interest (e.g., 2010 - 2014)
@@ -143,7 +143,7 @@ The ```OBS_GetMeasureById(geom_ref, measure_id, boundary_id)``` function returns
Name |Description
--- | ---
geom_ref | a geometry reference (e.g., a US Census geoid)
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf))
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf))
boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract')
time_span (optional) | time span of interest (e.g., 2010 - 2014). If `NULL` is passed, the measure from the most recent data will be used.
@@ -215,7 +215,7 @@ extent | A geometry of the extent of the input geometries
metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column
num_timespan_options | How many historical time periods to include. Defaults to 1
num_score_options | How many alternative boundary levels to include. Defaults to 1
target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
The schema of the metadata input objects are as follows:
@@ -321,6 +321,55 @@ SELECT OBS_GetMeta(
) FROM tablename
```
## OBS_MetadataValidation(extent geometry, geometry_type text, metadata json, target_geoms)
The ```OBS_MetadataValidation``` function performs a validation check over the known issues using the extent, type of geometry, and metadata that is being used in the ```OBS_GetMeta``` function.
#### Arguments
Name | Description
---- | -----------
extent | A geometry of the extent of the input geometries
geometry_type | The geometry type of the source data
metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optional additional parameters about that column
target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest
The schema of the metadata input objects are as follows:
Metadata Input Key | Description
--- | -----------
numer_id | The identifier for the desired measurement. If left blank, a `geom_id` is specified and the column returns a geometry, instead of a measurement
geom_id | Identifier for a desired geographic boundary level used to calculate measures. If undefined, this is automatically assigned. If defined, `numer_id` is blank and the column returns a geometry, instead of a measurement
normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. If the metadata object specifies a geometry, this is ignored
denom_id | When `normalization` is 'denominated', this is the identifier for a desired normalization column. This is automatically assigned. If the metadata object specifies a geometry, this is ignored
numer_timespan | The desired timespan for the measurement. If left unspecified, it defaults to the most recent timespan available
geom_timespan | The desired timespan for the geometry. If left unspecified, it defaults to the timespan matching `numer_timespan`
target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata. For example, if you are passing in points
target_geoms | Override global `target_geoms` for this element of metadata
max_timespan_rank | Only include timespans of this recency (For example, `1` is only the most recent timespan). There is no limit by default
max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary). The default is `1`
#### Returns
Key | Description
--- | -----------
valid | A boolean field that represents if the validation was successful or not
errors | A text array with all possible errors
#### Examples
Validate metadata with two additional columns of US census data; using a boundary relevant for the geometry provided and the latest timespan. Limited to the most recent column, and the most relevant, based on the extent and density of input geometries in `tablename`.
```SQL
SELECT OBS_MetadataValidation(
ST_SetSRID(ST_Extent(the_geom), 4326),
ST_GeometryType(the_geom),
'[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]',
COUNT(*)::INTEGER
) FROM tablename
GROUP BY ST_GeometryType(the_geom)
```
## OBS_GetData(geomvals array[geomval], metadata json)
The ```OBS_GetData(geomvals, metadata)``` function returns a measure and/or
@@ -465,7 +514,7 @@ WITH meta AS (
'[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]'
) meta FROM tablename)
SELECT id AS fips, (data->0->>'value')::Numeric AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename),
FROM OBS_GetData((SELECT ARRAY_AGG(fips) FROM tablename),
(SELECT meta FROM meta))
```
@@ -481,7 +530,7 @@ WITH meta AS (
) meta FROM tablename),
data as (
SELECT id AS fips, (data->0->>'value') AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename),
FROM OBS_GetData((SELECT ARRAY_AGG(fips) FROM tablename),
(SELECT meta FROM meta)))
UPDATE tablename
SET pop_density = data.pop_density

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.5.0'
default_version = '1.8.0'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -52,8 +52,8 @@ def get_tablename_query(column_id, boundary_id, timespan):
METADATA_TABLES = ['obs_table', 'obs_column_table', 'obs_column', 'obs_column_tag',
'obs_tag', 'obs_column_to_column', 'obs_dump_version', 'obs_meta',
'obs_meta_numer', 'obs_meta_denom', 'obs_meta_geom',
'obs_meta_timespan', 'obs_column_table_tile',
'obs_column_table_tile_simple']
'obs_meta_timespan', 'obs_meta_geom_numer_timespan',
'obs_column_table_tile', 'obs_column_table_tile_simple']
FIXTURES = [
('us.census.acs.B01003001_quantile', 'us.census.tiger.census_tract', '2010 - 2014'),

View File

@@ -1,3 +1,4 @@
requests
nose
nose_parameterized
psycopg2

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.5.0'
default_version = '1.8.0'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -166,28 +166,15 @@ BEGIN
EXECUTE format($string$
WITH _filters AS (SELECT
generate_series(1, array_length($3, 1)) id,
(unnest($3))->>'numer_id' numer_id,
(unnest($3))->>'denom_id' denom_id,
(unnest($3))->>'geom_id' geom_id,
(unnest($3))->>'numer_timespan' numer_timespan,
(unnest($3))->>'geom_timespan' geom_timespan,
(unnest($3))->>'normalization' normalization,
(unnest($3))->>'max_timespan_rank' max_timespan_rank,
(unnest($3))->>'max_score_rank' max_score_rank,
((unnest($3))->>'target_geoms')::INTEGER target_geoms,
((unnest($3))->>'target_area')::Numeric target_area
row_number() over () id, *
FROM json_to_recordset($3)
AS x(numer_id TEXT, denom_id TEXT, geom_id TEXT, numer_timespan TEXT,
geom_timespan TEXT, normalization TEXT, max_timespan_rank TEXT,
max_score_rank TEXT, target_geoms INTEGER, target_area Numeric
)
), meta AS (SELECT
id,
f.numer_id,
LOWER(TRIM(BOTH '_' FROM regexp_replace(CASE WHEN f.numer_id IS NOT NULL
THEN CASE
WHEN normalization ILIKE 'area%%' THEN numer_colname || ' per sq km'
WHEN normalization ILIKE 'denom%%' THEN numer_colname || ' rate'
ELSE numer_colname
END || ' ' || m.numer_timespan
ELSE geom_name || ' ' || m.geom_timespan
END, '[^a-zA-Z0-9]+', '_', 'g'))) suggested_name,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_aggregate END numer_aggregate,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_colname END numer_colname,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_geomref_colname END numer_geomref_colname,
@@ -217,7 +204,17 @@ BEGIN
geom_description,
geom_t_description,
geom_type,
normalization,
Coalesce(normalization,
-- automatically assign normalization to numeric numerators
CASE WHEN cdb_observatory.isnumeric(numer_type) THEN
CASE WHEN denom_reltype ILIKE 'denominator' THEN 'denominated'
WHEN numer_aggregate ILIKE 'sum' THEN 'area'
WHEN numer_aggregate IN ('median', 'average') AND denom_reltype ILIKE 'universe'
THEN 'prenormalized'
ELSE 'prenormalized'
END ELSE NULL
END
) normalization,
max_timespan_rank,
max_score_rank,
target_geoms,
@@ -249,7 +246,16 @@ BEGIN
'score_rownum', row_number() over
(PARTITION BY id, numer_timespan ORDER BY score DESC, Coalesce(denom_id, '')),
'score', scores.score,
'suggested_name', cdb_observatory.FIRST(meta.suggested_name),
'suggested_name', cdb_observatory.FIRST(
LOWER(TRIM(BOTH '_' FROM regexp_replace(CASE WHEN numer_id IS NOT NULL
THEN CASE
WHEN normalization ILIKE 'area%%' THEN numer_colname || ' per sq km'
WHEN normalization ILIKE 'denom%%' THEN numer_colname || ' rate'
ELSE numer_colname
END || ' ' || numer_timespan
ELSE geom_name || ' ' || geom_timespan
END, '[^a-zA-Z0-9]+', '_', 'g')))
),
'numer_aggregate', cdb_observatory.FIRST(meta.numer_aggregate),
'numer_colname', cdb_observatory.FIRST(meta.numer_colname),
'numer_geomref_colname', cdb_observatory.FIRST(meta.numer_geomref_colname),
@@ -305,7 +311,7 @@ BEGIN
ELSE geom
END,
target_geoms,
(SELECT ARRAY(SELECT json_array_elements_text(params))::json[]),
params,
num_timespan_options,
num_score_options, numer_filters, geom_filters
;
@@ -587,14 +593,9 @@ RETURNS TABLE (
)
AS $$
DECLARE
geom_colspecs TEXT;
geom_tables TEXT;
geomrefs_alias TEXT;
geomrefs_noalias TEXT;
data_colspecs TEXT;
data_tables TEXT;
obs_wheres TEXT;
user_wheres TEXT;
procgeom_clauses TEXT;
val_clauses TEXT;
json_clause TEXT;
geomtype TEXT;
BEGIN
IF params IS NULL OR JSON_ARRAY_LENGTH(params) = 0 OR ARRAY_LENGTH(geomvals, 1) IS NULL THEN
@@ -604,222 +605,208 @@ BEGIN
geomtype := ST_GeometryType(geomvals[1].geom);
EXECUTE
$query$
WITH _meta AS (SELECT
row_number() over () colid,
meta->>'id' id,
meta->>'numer_id' numer_id,
meta->>'numer_aggregate' numer_aggregate,
meta->>'numer_colname' numer_colname,
meta->>'numer_geomref_colname' numer_geomref_colname,
meta->>'numer_tablename' numer_tablename,
meta->>'numer_type' numer_type,
meta->>'denom_id' denom_id,
meta->>'denom_aggregate' denom_aggregate,
meta->>'denom_colname' denom_colname,
meta->>'denom_geomref_colname' denom_geomref_colname,
meta->>'denom_tablename' denom_tablename,
meta->>'denom_type' denom_type,
meta->>'denom_reltype' denom_reltype,
meta->>'geom_id' geom_id,
meta->>'geom_colname' geom_colname,
meta->>'geom_geomref_colname' geom_geomref_colname,
meta->>'geom_tablename' geom_tablename,
meta->>'geom_type' geom_type,
meta->>'numer_timespan' numer_timespan,
meta->>'geom_timespan' geom_timespan,
meta->>'normalization' normalization,
meta->>'api_method' api_method,
meta->'api_args' api_args
FROM UNNEST($1) AS meta
)
/* Read metadata to generate clauses for query */
EXECUTE $query$
WITH _meta AS (SELECT
row_number() over () colid, *
FROM json_to_recordset($1)
AS x(id TEXT, numer_id TEXT, numer_aggregate TEXT, numer_colname TEXT,
numer_geomref_colname TEXT, numer_tablename TEXT, numer_type TEXT,
denom_id TEXT, denom_aggregate TEXT, denom_colname TEXT,
denom_geomref_colname TEXT, denom_tablename TEXT, denom_type TEXT,
denom_reltype TEXT, geom_id TEXT, geom_colname TEXT,
geom_geomref_colname TEXT, geom_tablename TEXT, geom_type TEXT,
numer_timespan TEXT, geom_timespan TEXT, normalization TEXT,
api_method TEXT, api_args JSON)
),
-- Generate procgeom clauses.
-- These join the users' geoms to the relevant geometries for the
-- asked-for measures in the Observatory.
_procgeom_clauses AS (
SELECT
String_Agg(DISTINCT
CASE
-- pass-through geom if user is requesting it only
WHEN numer_id IS NULL AND api_method IS NULL THEN
geom_tablename || '.' || geom_colname || ' AS geom_' || geom_tablename
WHEN cdb_observatory.isnumeric(numer_type) AND api_method IS NULL THEN
-- for numeric points with area normalization, include areas of underlying geoms
CASE
WHEN $2 = 'ST_Point' AND (LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')) THEN
' Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography), 0)/1000000 ' ||
' AS area_' || geom_tablename
-- for numeric areas, include more complex calcs
WHEN $2 != 'ST_Point' THEN
'CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE ST_Area(cdb_observatory.safe_intersection(_geoms.geom, ' ||
geom_tablename || '.' || geom_colname || ')) / ' ||
'Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0) ' ||
'END pct_' || geom_tablename
ELSE NULL
END
ELSE NULL END
, ', ') AS geom_colspecs,
String_Agg(DISTINCT 'observatory.' || geom_tablename, ', ') AS geom_tables,
String_Agg(
'JSON_Build_Object(' || CASE
-- api-delivered values
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%' OR
(normalization IS NULL AND LOWER(denom_reltype) LIKE 'denominator')
THEN CASE
-- denominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))'
-- denominated polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / SUM (denom * (% OBS geom in user geom))
ELSE
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- areaNormalized
WHEN LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')
THEN CASE
-- areaNormalized point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / area_' || geom_tablename || ')'
-- areaNormalized polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / area of big geom
ELSE
--' NULL END '
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / (Nullif(ST_Area(cdb_observatory.FIRST(_procgeoms.geom)::Geography), 0) / 1000000) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- median/average measures with universe
WHEN LOWER(numer_aggregate) IN ('median', 'average') AND
denom_reltype ILIKE 'universe' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation weighted by universe
-- SUM (numer * denom * (% user geom in OBS geom)) / SUM (denom * (% user geom in OBS geom))
-- (10 * 1000 * 1) / (1000 * 1) = 10
-- (10 * 1000 * 1 + 50 * 10 * 1) / (1000 + 10) = 10500 / 10000 = 10.5
' SUM(' || numer_tablename || '.' || numer_colname ||
' * ' || denom_tablename || '.' || denom_colname ||
' * pct_' || geom_tablename ||
' ) / Nullif(SUM(' || denom_tablename || '.' || denom_colname ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- prenormalized for summable measures. point or summable only!
WHEN numer_aggregate ILIKE 'sum' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation
-- SUM (numer * (% user geom in OBS geom))
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- Everything else. Point only!
ELSE CASE
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
' cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric '
END
END || '::' || numer_type
-- categorical/text
WHEN LOWER(numer_type) LIKE 'text' THEN
'''value'', ' || 'MODE() WITHIN GROUP (ORDER BY ' || numer_tablename || '.' || numer_colname || ') '
-- geometry
WHEN numer_id IS NULL THEN
'''geomref'', geomref_' || geom_tablename || ', ' ||
'''value'', ' || 'cdb_observatory.FIRST(geom_' || geom_tablename ||
')::TEXT'
-- code below will return the intersection of the user's geom and the
-- OBS geom
--'''value'', ' || 'ST_Union(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT'
ELSE ''
END || ')', ', ')
AS colspecs,
-- geomrefs, used to separate out rows in case we don't want to merge
-- results by user input IDs
--
-- api_method and geom_tablename are interchangeable since when an
-- api_method is passed, geom_tablename is ignored
String_Agg(DISTINCT COALESCE(geom_tablename, api_method) || '.' || geom_geomref_colname ||
' AS geomref_' || COALESCE(geom_tablename, api_method), ', ') AS geomrefs_alias,
String_Agg(DISTINCT 'geomref_' || COALESCE(geom_tablename, api_method)
, ', ') AS geomrefs_noalias,
(SELECT String_Agg(DISTINCT CASE
-- External API
WHEN tablename LIKE 'cdb_observatory.%' THEN
'LATERAL (SELECT * FROM ' || tablename || ') ' ||
REPLACE(split_part(tablename, '(', 1), 'cdb_observatory.', '')
-- Internal obs_ table
ELSE 'observatory.' || tablename
END, ', ') FROM (
SELECT DISTINCT UNNEST(tablenames_ary) tablename FROM (
SELECT ARRAY_AGG(numer_tablename) ||
ARRAY_AGG(denom_tablename) ||
ARRAY_AGG('cdb_observatory.' || api_method || '(_procgeoms.geom' || COALESCE(', ' ||
(SELECT STRING_AGG(REPLACE(val::text, '"', ''''), ', ')
FROM (SELECT json_array_elements(api_args) as val) as vals),
'') || ')')
tablenames_ary
) tablenames_inner
) tablenames_outer) data_tables,
String_Agg(DISTINCT array_to_string(ARRAY[
CASE WHEN numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
'_procgeoms.geomref_' || geom_tablename
ELSE NULL END,
CASE WHEN numer_tablename != denom_tablename
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname
ELSE NULL END
], ' AND '),
' AND ') FILTER (WHERE numer_tablename != denom_tablename OR
(numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL)) AS obs_wheres,
String_Agg(DISTINCT 'ST_Intersects(' || geom_tablename || '.' || geom_colname
|| ', _geoms.geom)', ' AND ')
AS user_wheres
'_procgeoms_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' AS (' ||
CASE WHEN api_method IS NULL THEN
'SELECT _geoms.id, ' ||
CASE $3 WHEN True THEN '_geoms.geom'
ELSE geom_tablename || '.' || geom_colname
END || ' AS geom, ' ||
geom_tablename || '.' || geom_geomref_colname || ' AS geomref, ' ||
CASE
WHEN $2 = 'ST_Point' THEN
' Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography), 0)/1000000 ' ||
' AS area'
-- for numeric areas, include more complex calcs
ELSE
'CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')
THEN ST_Area(_geoms.geom) / Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)
WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom)
THEN 1
ELSE ST_Area(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) /
Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)
END pct_obs'
END || '
FROM _geoms, observatory.' || geom_tablename || '
WHERE ST_Intersects(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')'
-- pass through input geometries for api_method
ELSE 'SELECT _geoms.id, _geoms.geom FROM _geoms'
END ||
') '
AS procgeom_clause
FROM _meta
;
$query$
INTO geom_colspecs, geom_tables, data_colspecs, geomrefs_alias,
geomrefs_noalias, data_tables, obs_wheres, user_wheres
USING (SELECT ARRAY(SELECT json_array_elements_text(params))::json[]), geomtype;
GROUP BY api_method, geom_tablename, geom_geomref_colname, geom_colname
),
-- Generate val clauses.
-- These perform interpolations or other necessary calculations to
-- provide values according to users geometries.
_val_clauses AS (
SELECT
'_vals_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' AS (
SELECT _procgeoms.id, ' ||
String_Agg('json_build_object(' || CASE
-- api-delivered values
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%'
THEN CASE
WHEN denom_tablename IS NULL THEN ' NULL '
-- denominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))'
-- denominated polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / SUM (denom * (% OBS geom in user geom))
ELSE
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs ' ||
' ) / NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * _procgeoms.pct_obs), 0) '
END
-- areaNormalized
WHEN LOWER(normalization) LIKE 'area%'
THEN CASE
-- areaNormalized point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / _procgeoms.area)'
-- areaNormalized polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / area of big geom
ELSE
--' NULL END '
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs' ||
' ) / (Nullif(ST_Area(cdb_observatory.FIRST(_procgeoms.geom)::Geography), 0) / 1000000) '
END
-- median/average measures with universe
WHEN LOWER(numer_aggregate) IN ('median', 'average') AND
denom_reltype ILIKE 'universe' AND LOWER(normalization) LIKE 'pre%'
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation weighted by universe
-- SUM (numer * denom * (% user geom in OBS geom)) / SUM (denom * (% user geom in OBS geom))
-- (10 * 1000 * 1) / (1000 * 1) = 10
-- (10 * 1000 * 1 + 50 * 10 * 1) / (1000 + 10) = 10500 / 10000 = 10.5
' SUM(' || numer_tablename || '.' || numer_colname ||
' * ' || denom_tablename || '.' || denom_colname ||
' * _procgeoms.pct_obs ' ||
' ) / Nullif(SUM(' || denom_tablename || '.' || denom_colname ||
' * _procgeoms.pct_obs ' || '), 0) '
END
-- prenormalized for summable measures. point or summable only!
WHEN numer_aggregate ILIKE 'sum' AND LOWER(normalization) LIKE 'pre%'
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation
-- SUM (numer * (% user geom in OBS geom))
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs) '
END
-- Everything else. Point only!
ELSE CASE
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
' cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric '
END
END || '::' || numer_type
-- categorical/text
WHEN LOWER(numer_type) LIKE 'text' THEN
'''value'', ' || 'MODE() WITHIN GROUP (ORDER BY ' || numer_tablename || '.' || numer_colname || ') '
-- geometry
WHEN numer_id IS NULL THEN
'''geomref'', _procgeoms.geomref, ' ||
'''value'', ' || 'cdb_observatory.FIRST(_procgeoms.geom)::TEXT'
-- code below will return the intersection of the user's geom and the
-- OBS geom
--'''value'', ' || 'ST_Union(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT'
ELSE ''
END
|| ') val_' || colid, ', ')
|| '
FROM _procgeoms_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' _procgeoms ' ||
Coalesce(String_Agg(DISTINCT
Coalesce('LEFT JOIN observatory.' || numer_tablename || ' ON _procgeoms.geomref = observatory.' || numer_tablename || '.' || numer_geomref_colname,
', LATERAL (SELECT * FROM cdb_observatory.' || api_method || '(_procgeoms.geom' || Coalesce(', ' ||
(SELECT STRING_AGG(REPLACE(val::text, '"', ''''), ', ')
FROM (SELECT JSON_Array_Elements(api_args) as val) as vals),
'') || ')) AS ' || api_method)
, ' '), '') ||
CASE $3 WHEN True THEN E'\n GROUP BY _procgeoms.id ORDER BY _procgeoms.id '
ELSE E'\n GROUP BY _procgeoms.id, _procgeoms.geomref
ORDER BY _procgeoms.id, _procgeoms.geomref' END
|| ')'
AS val_clause,
'_vals_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) AS cte_name
FROM _meta
GROUP BY geom_tablename, geom_geomref_colname, geom_colname, api_method
),
-- Generate clauses necessary to join together val_clauses
_val_joins AS (
SELECT String_Agg(a.cte_name || '.id = ' || b.cte_name || '.id ', ' AND ') val_joins
FROM _val_clauses a, _val_clauses b
WHERE a.cte_name != b.cte_name
AND a.cte_name < b.cte_name
),
-- Generate JSON clause. This puts together vals from val_clauses
_json_clause AS (SELECT
'SELECT ' || cdb_observatory.FIRST(cte_name) || '.id::INT,
Array_to_JSON(ARRAY[' || (SELECT String_Agg('val_' || colid, ', ') FROM _meta) || '])
FROM ' || String_Agg(cte_name, ', ') ||
Coalesce(' WHERE ' || val_joins, '')
AS json_clause
FROM _val_clauses, _val_joins
GROUP BY val_joins
)
SELECT (SELECT String_Agg(procgeom_clause, E',\n ') FROM _procgeom_clauses),
(SELECT String_Agg(val_clause, E',\n ') FROM _val_clauses),
json_clause
FROM _json_clause
$query$ INTO
procgeom_clauses,
val_clauses,
json_clause
USING params, geomtype, merge;
/* Execute query */
RETURN QUERY EXECUTE format($query$
WITH _raw_geoms AS (%s),
_geoms AS (SELECT id,
@@ -827,27 +814,21 @@ BEGIN
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.00001)), 3)
ELSE geom END geom
FROM _raw_geoms),
_procgeoms AS (SELECT _geoms.id, _geoms.geom %s %s
FROM _geoms %s
%s
)
SELECT _procgeoms.id::INT, Array_to_JSON(ARRAY[%s]::JSON[])
FROM _procgeoms %s
%s
GROUP BY _procgeoms.id %s
ORDER BY _procgeoms.id
$query$, CASE WHEN ARRAY_LENGTH(geomvals, 1) = 1 THEN
' SELECT $1[1].val as id, $1[1].geom as geom '
ELSE
' SELECT val as id, geom FROM UNNEST($1) '
-- procgeom_clauses
%s,
-- val_clauses
%s
-- json_clause
%s
$query$, CASE WHEN ARRAY_LENGTH(geomvals, 1) = 1
THEN ' SELECT $1[1].val as id, $1[1].geom as geom '
ELSE ' SELECT val as id, geom FROM UNNEST($1) '
END,
', ' || NullIf(geomrefs_alias, ''),
', ' || NullIf(geom_colspecs, ''),
', ' || NullIf(geom_tables, ''),
'WHERE ' || NullIf( user_wheres, ''),
data_colspecs, ', ' || NullIf(data_tables, ''),
'WHERE ' || NULLIF(obs_wheres, ''),
CASE WHEN merge IS False THEN ', ' || geomrefs_noalias ELSE '' END)
String_Agg(procgeom_clauses, E',\n '),
String_Agg(val_clauses, E',\n '),
json_clause)
USING geomvals;
RETURN;
END;
@@ -1095,3 +1076,46 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql STABLE;
-- MetadataValidation checks the metadata parameters and the geometry type
-- of the data in order to find possible wrong cases
CREATE OR REPLACE FUNCTION cdb_observatory.obs_metadatavalidation(
geometry_extent geometry(Geometry, 4326),
geometry_type text,
params JSON,
target_geoms INTEGER DEFAULT NULL
)
RETURNS TABLE(valid boolean, errors text[]) AS $$
DECLARE
meta json;
errors text[];
BEGIN
errors := (ARRAY[])::TEXT[];
IF geometry_type IN ('ST_Polygon', 'ST_MultiPolygon') THEN
FOR meta IN EXECUTE 'SELECT json_array_elements(cdb_observatory.OBS_GetMeta($1, $2, 1, 1, $3))' USING geometry_extent, params, target_geoms
LOOP
IF (meta->>'normalization' = 'denominated' AND meta->>'denom_id' is NULL) THEN
errors := array_append(errors, 'Normalizated measure should have a numerator and a denominator. Please review the provided options.');
END IF;
IF (meta->>'numer_aggregate' IS NULL) THEN
errors := array_append(errors, 'For polygon geometries, aggregation is mandatory. Please review the provided options');
END IF;
IF (meta->>'numer_aggregate' IN ('median', 'average') AND meta->>'denom_id' IS NULL) THEN
errors := array_append(errors, 'Median or average aggregation for polygons requires a denominator to provide weights. Please review the provided options');
END IF;
IF (meta->>'numer_aggregate' IN ('median', 'average') AND meta->>'normalization' NOT LIKE 'pre%') THEN
errors := array_append(errors, format('Median or average aggregation only supports prenormalized normalization, %s passed. Please review the provided options', meta->>'normalization'));
END IF;
END LOOP;
IF CARDINALITY(errors) > 0 THEN
RETURN QUERY EXECUTE 'SELECT FALSE, $1' USING errors;
ELSE
RETURN QUERY SELECT TRUE, ARRAY[]::TEXT[];
END IF;
ELSE
RETURN QUERY SELECT TRUE, ARRAY[]::TEXT[];
END IF;
RETURN;
END;
$$ LANGUAGE plpgsql STABLE;

View File

@@ -181,6 +181,86 @@ BEGIN
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetNumerators(
bounds GEOMETRY DEFAULT NULL,
section_tags TEXT[] DEFAULT ARRAY[]::TEXT[],
subsection_tags TEXT[] DEFAULT ARRAY[]::TEXT[],
other_tags TEXT[] DEFAULT ARRAY[]::TEXT[],
ids TEXT[] DEFAULT ARRAY[]::TEXT[],
name TEXT DEFAULT NULL,
denom_id TEXT DEFAULT '',
geom_id TEXT DEFAULT '',
timespan TEXT DEFAULT ''
) RETURNS TABLE (
numer_id TEXT,
numer_name TEXT,
numer_description TEXT,
numer_weight NUMERIC,
numer_license TEXT,
numer_source TEXT,
numer_type TEXT,
numer_aggregate TEXT,
numer_extra JSONB,
numer_tags JSONB,
valid_denom BOOLEAN,
valid_geom BOOLEAN,
valid_timespan BOOLEAN
) AS $$
DECLARE
where_clause_elements TEXT[];
geom_clause TEXT;
where_clause TEXT;
BEGIN
where_clause_elements := (ARRAY[])::TEXT[];
where_clause := '';
IF bounds IS NOT NULL THEN
where_clause_elements := array_append(where_clause_elements, format($data$ST_Intersects(the_geom, '%s'::geometry)$data$, bounds));
END IF;
IF cardinality(section_tags) > 0 THEN
where_clause_elements := array_append(where_clause_elements, format($data$numer_tags ?| '%s'$data$, section_tags));
END IF;
IF cardinality(subsection_tags) > 0 THEN
where_clause_elements := array_append(where_clause_elements, format($data$numer_tags ?| '%s'$data$, subsection_tags));
END IF;
IF cardinality(other_tags) > 0 THEN
where_clause_elements := array_append(where_clause_elements, format($data$numer_tags ?| '%s'$data$, other_tags));
END IF;
IF cardinality(ids) > 0 THEN
where_clause_elements := array_append(where_clause_elements, format($data$numer_id IN (array_to_string('%s'::text[], ','))$data$, ids));
END IF;
IF name IS NOT NULL AND name != '' THEN
where_clause_elements := array_append(where_clause_elements, format($data$numer_name ilike '%%%s%%'$data$, name));
END IF;
IF cardinality(where_clause_elements) > 0 THEN
where_clause := format($clause$WHERE %s$clause$, array_to_string(where_clause_elements, ' AND '));
END IF;
RAISE DEBUG '%', array_to_string(where_clause_elements, ' AND ');
RETURN QUERY
EXECUTE
format($string$
SELECT numer_id::TEXT,
numer_name::TEXT,
numer_description::TEXT,
numer_weight::NUMERIC,
NULL::TEXT license,
NULL::TEXT source,
numer_type numer_type,
numer_aggregate numer_aggregate,
numer_extra::JSONB numer_extra,
numer_tags numer_tags,
$1 = ANY(denoms) valid_denom,
$2 = ANY(geoms) valid_geom,
$3 = ANY(timespans) valid_timespan
FROM observatory.obs_meta_numer
%s
$string$, where_clause)
USING denom_id, geom_id, timespan;
RETURN;
END
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableDenominators(
bounds GEOMETRY DEFAULT NULL,
filter_tags TEXT[] DEFAULT NULL,
@@ -243,7 +323,8 @@ CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableGeometries(
filter_tags TEXT[] DEFAULT NULL,
numer_id TEXT DEFAULT NULL,
denom_id TEXT DEFAULT NULL,
timespan TEXT DEFAULT NULL
timespan TEXT DEFAULT NULL,
number_geoms INTEGER DEFAULT NULL
) RETURNS TABLE (
geom_id TEXT,
geom_name TEXT,
@@ -292,21 +373,34 @@ BEGIN
geom_type::TEXT,
geom_extra::JSONB,
geom_tags::JSONB,
$1 = ANY(numers) valid_numer,
$2 = ANY(denoms) valid_denom,
$3 = ANY(timespans) valid_timespan
FROM observatory.obs_meta_geom
$1 = ANY(numers) valid_numer,
$2 = ANY(denoms) valid_denom,
CASE WHEN $3 IS NOT NULL AND $3 != '' THEN
-- Here we are looking for geometries with: a) geometry timespan or b) numerators linked to that geometries that fit in the
-- timespan passed. For example it look for geometries with timespan '2015 - 2015' or numerators linked to that geometry that has
-- '2015 - 2015' as one of the valid timespans.
-- If we pass a numerator_id, we filter by that numerator
CASE WHEN $1 IS NOT NULL AND $1 != '' THEN
EXISTS (SELECT 1 FROM observatory.obs_meta_geom_numer_timespan onu WHERE o.geom_id = onu.geom_id AND onu.numer_id = $1 AND ($3 = ANY(onu.timespans) OR $3 IN (select(unnest(o.timespans)))))
ELSE
EXISTS (SELECT 1 FROM observatory.obs_meta_geom_numer_timespan onu WHERE o.geom_id = onu.geom_id AND ($3 = ANY(onu.timespans) OR $3 IN (select(unnest(o.timespans)))))
END
ELSE
false
END as valid_timespan
FROM observatory.obs_meta_geom o
WHERE %s (geom_tags ?& $4 OR CARDINALITY($4) = 0)
), scores AS (
SELECT * FROM cdb_observatory._OBS_GetGeometryScores($5,
(SELECT ARRAY_AGG(geom_id) FROM available_geoms)
SELECT * FROM cdb_observatory._OBS_GetGeometryScores(bounds => $5,
filter_geom_ids => (SELECT ARRAY_AGG(geom_id) FROM available_geoms),
desired_num_geoms => $6::integer
)
) SELECT available_geoms.*, score, numtiles, notnull_percent, numgeoms,
) SELECT DISTINCT ON (geom_id) available_geoms.*, score, numtiles, notnull_percent, numgeoms,
percentfill, estnumgeoms, meanmediansize
FROM available_geoms, scores
WHERE available_geoms.geom_id = scores.column_id
$string$, geom_clause)
USING numer_id, denom_id, timespan, filter_tags, bounds;
USING numer_id, denom_id, timespan, filter_tags, bounds, number_geoms;
RETURN;
END
$$ LANGUAGE plpgsql;

View File

@@ -153,6 +153,9 @@ t
obs_getmeta_suggested_name
t
(1 row)
obs_getmeta_suggested_name_implicit_area
t
(1 row)
obs_getmeta_suggested_name_area
t
(1 row)
@@ -207,6 +210,9 @@ t|t|t
id|data_polygon_measure_one_null|data_polygon_measure_two_null
t|t|t
(1 row)
id|data_polygon_measure_one_null|data_polygon_measure_two_null
t|t|t
(1 row)
id|data_polygon_measure_one_predenom|data_polygon_measure_two_predenom
t|t|t
(1 row)
@@ -298,3 +304,12 @@ tract_sample|tract_max_error|tract_avg_error|tract_min_error
no_bg_point_error
t
(1 row)
valid|errors
t|{}
(1 row)
valid|errors
f|{"Median or average aggregation only supports prenormalized normalization, denominated passed. Please review the provided options"}
(1 row)
valid|errors
f|{"Normalizated measure should have a numerator and a denominator. Please review the provided options."}
(1 row)

View File

@@ -48,6 +48,63 @@ t
_obs_getavailablenumerators_no_total_pop_1996
t
(1 row)
_obs_getnumerators_usa_pop_in_all
t
(1 row)
_obs_getnumerators_usa_pop_in_nyc_point
t
(1 row)
_obs_getnumerators_usa_pop_in_usa_extents
t
(1 row)
_obs_getnumerators_no_usa_pop_not_in_zero_point
t
(1 row)
_obs_getnumerators_usa_pop_in_age_gender_subsection
t
(1 row)
_obs_getnumerators_no_pop_in_income_subsection
t
(1 row)
_obs_getnumerators_male_pop_denom_by_total_pop
t
(1 row)
_obs_getnumerators_no_income_denom_by_total_pop
t
(1 row)
_obs_getnumerators_zillow_at_zcta5
t
(1 row)
_obs_getnumerators_no_zillow_at_block_group
t
(1 row)
_obs_getnumerators_total_pop_2010_2014
t
(1 row)
_obs_getnumerators_no_total_pop_1996
t
(1 row)
_obs_getnumerators_total_pop_by_name
t
(1 row)
_obs_getnumerators_total_pop_by_section
t
(1 row)
_obs_getnumerators_total_pop_not_in_canada
t
(1 row)
_obs_getnumerators_total_pop_by_subsection
t
(1 row)
_obs_getnumerators_total_pop_not_in_employment_subsection
t
(1 row)
_obs_getnumerators_total_pop_by_id
t
(1 row)
_obs_getnumerators_total_pop_not_with_other_id
t
(1 row)
_obs_getavailabledenominators_usa_pop_in_all
t
(1 row)

View File

@@ -12,6 +12,7 @@ DROP TABLE IF EXISTS observatory.obs_meta_numer;
DROP TABLE IF EXISTS observatory.obs_meta_denom;
DROP TABLE IF EXISTS observatory.obs_meta_geom;
DROP TABLE IF EXISTS observatory.obs_meta_timespan;
DROP TABLE IF EXISTS observatory.obs_meta_geom_numer_timespan;
DROP TABLE IF EXISTS observatory.obs_column_table_tile;
DROP TABLE IF EXISTS observatory.obs_column_table_tile_simple;
DROP TABLE IF EXISTS observatory.obs_78fb6c1d6ff6505225175922c2c389ce48d7632c;

File diff suppressed because one or more lines are too long

View File

@@ -268,7 +268,7 @@ SELECT
(meta->0->>'numer_name') = 'Total Population' numer_name,
(meta->0->>'denom_id') IS NULL denom_id,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'area' normalization
FROM meta;
-- OBS_GetMeta for point completes one partial measure with "best" metadata
@@ -290,7 +290,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for polygon completes one partial measure with "best" metadata
@@ -308,7 +308,7 @@ SELECT
(meta->0->>'numer_name') = 'Total Population' numer_name,
(meta->0->>'denom_id') IS NULL denom_id,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'area' normalization
FROM meta;
-- OBS_GetMeta for polygon completes one partial measure with "best" metadata
@@ -330,7 +330,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with "best"
@@ -352,7 +352,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization,
(meta->0->>'normalization') = 'denominated' normalization,
(meta->1->>'id')::integer = 1 id,
(meta->1->>'numer_id') = 'us.census.acs.B01001002' numer_id,
(meta->1->>'timespan_rank')::integer = 1 timespan_rank,
@@ -367,7 +367,7 @@ SELECT
(meta->1->>'denom_type') = 'Numeric' denom_type,
(meta->1->>'denom_name') = 'Total Population' denom_name,
(meta->1->>'geom_id') = 'us.census.tiger.census_tract' geom_id,
(meta->1->>'normalization') IS NULL normalization
(meta->1->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with "best" metadata
@@ -389,7 +389,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.census_tract' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with conflicting
@@ -400,9 +400,14 @@ AS obs_getmeta_conflicting_metadata;
-- OBS_GetMeta provides suggested name for simple meta request
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001"}]'
'[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom"}]'
)->0->>'suggested_name' = 'total_pop_2010_2014' obs_getmeta_suggested_name;
-- OBS_GetMeta provides suggested name for simple meta request with area norm
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001"}]'
)->0->>'suggested_name' = 'total_pop_per_sq_km_2010_2014' obs_getmeta_suggested_name_implicit_area;
-- OBS_GetMeta provides suggested name for simple meta request with area norm
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001", "normalization": "area"}]'
@@ -591,6 +596,18 @@ SELECT id = 1 id,
abs((data->1->>'value')::Numeric - 0.4902) / 0.4902 < 0.001 data_polygon_measure_two_null
FROM data;
-- OBS_GetData/OBS_GetMeta by geom with two measures and one return null
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
'[{"numer_id": "us.census.acs.B19013001_quantile"}, {"numer_id": "us.census.acs.B01001002"}]') meta),
data AS (SELECT * FROM cdb_observatory.OBS_GetData(
ARRAY[(cdb_observatory._TestArea(), 1)::geomval],
(SELECT meta FROM meta)))
SELECT id = 1 id,
(data->0->>'value') is NULL data_polygon_measure_one_null,
abs((data->1->>'value')::Numeric - 0.4902) / 0.4902 < 0.001 data_polygon_measure_two_null
FROM data;
-- OBS_GetData/OBS_GetMeta by geom with two standard measures predenom normalization
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
@@ -677,25 +694,25 @@ FROM data;
-- OBS_GetData/OBS_GetMeta by geom with polygons inside a polygon + one measure
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group"}]') meta),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "geom_id": "us.census.tiger.block_group"}]') meta),
data AS (SELECT * FROM cdb_observatory.OBS_GetData(
ARRAY[(cdb_observatory._TestArea(), 1)::geomval],
(SELECT meta FROM meta), false))
SELECT every(id = 1) is TRUE id,
count(distinct (data->0->>'value')::geometry) = 16 correct_num_geoms,
abs(sum((data->1->>'value')::numeric) - 15787) / 15787 < 0.001 correct_pop
abs(sum((data->1->>'value')::numeric) - 12327) / 12327 < 0.001 correct_pop
FROM data;
-- OBS_GetData/OBS_GetMeta by geom with polygons inside a polygon + one measure + one text
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.tiger.name", "geom_id": "us.census.tiger.block_group"}]') meta),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.tiger.name", "geom_id": "us.census.tiger.block_group"}]') meta),
data AS (SELECT * FROM cdb_observatory.OBS_GetData(
ARRAY[(cdb_observatory._TestArea(), 1)::geomval],
(SELECT meta FROM meta), false))
SELECT every(id = 1) is TRUE id,
count(distinct (data->0->>'value')::geometry) = 16 correct_num_geoms,
abs(sum((data->1->>'value')::numeric) - 15787) / 15787 < 0.001 correct_pop,
abs(sum((data->1->>'value')::numeric) - 12327) / 12327 < 0.001 correct_pop,
array_agg(distinct data->2->>'value') = '{"Block Group 1","Block Group 2","Block Group 3","Block Group 4","Block Group 5"}' correct_bg_names
FROM data;
@@ -956,3 +973,9 @@ WITH _geoms AS (
FROM geoms, results
WHERE cartodb_id = id
;
-- OBS_MetadataValidation
SELECT * FROM cdb_observatory.OBS_MetadataValidation(NULL, 'ST_Polygon', '[{"numer_id": "us.census.acs.B01003001","denom_id": null,"normalization": "prenormalized","geom_id": null,"numer_timespan": "2010 - 2014"}]'::json, 500);
SELECT * FROM cdb_observatory.OBS_MetadataValidation(NULL, 'ST_Polygon', '[{"numer_id": "us.census.acs.B25058001","denom_id": null,"normalization": "denominated","geom_id": null,"numer_timespan": "2010 - 2014"}]'::json, 500);
SELECT * FROM cdb_observatory.OBS_MetadataValidation(NULL, 'ST_Polygon', '[{"numer_id": "us.census.acs.B15003001","denom_id": null,"normalization": "denominated","geom_id": null,"numer_timespan": "2010 - 2014"}]'::json, 500);

View File

@@ -119,6 +119,142 @@ FROM cdb_observatory.OBS_GetAvailableNumerators(
) WHERE valid_timespan = True)
AS _obs_getavailablenumerators_no_total_pop_1996;
--
-- _OBS_GetNumerators tests
--
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators())
AS _obs_getnumerators_usa_pop_in_all;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)) AS _obs_getnumerators_usa_pop_in_nyc_point;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakeEnvelope(
-169.8046875, 21.289374355860424,
-47.4609375, 72.0739114882038
), 4326),
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)) AS _obs_getnumerators_usa_pop_in_usa_extents;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(0, 0), 4326),
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL
)) AS _obs_getnumerators_no_usa_pop_not_in_zero_point;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
subsection_tags => ARRAY['subsection/tags.age_gender']
))
AS _obs_getnumerators_usa_pop_in_age_gender_subsection;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
subsection_tags => ARRAY['subsection/tags.income']
))
AS _obs_getnumerators_no_pop_in_income_subsection;
SELECT 'us.census.acs.B01001002' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
denom_id => 'us.census.acs.B01003001'
) WHERE valid_denom = True)
AS _obs_getnumerators_male_pop_denom_by_total_pop;
SELECT 'us.census.acs.B19013001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
denom_id => 'us.census.acs.B01003001'
) WHERE valid_denom = True)
AS _obs_getnumerators_no_income_denom_by_total_pop;
SELECT 'us.zillow.AllHomes_Zhvi' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
geom_id => 'us.census.tiger.zcta5'
) WHERE valid_geom = True)
AS _obs_getnumerators_zillow_at_zcta5;
SELECT 'us.zillow.AllHomes_Zhvi' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
geom_id => 'us.census.tiger.block_group'
) WHERE valid_geom = True)
AS _obs_getnumerators_no_zillow_at_block_group;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
timespan => '2010 - 2014'
) WHERE valid_timespan = True)
AS _obs_getnumerators_total_pop_2010_2014;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
timespan => '1996'
) WHERE valid_timespan = True)
AS _obs_getnumerators_no_total_pop_1996;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
name => 'tot'
))
AS _obs_getnumerators_total_pop_by_name;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
section_tags => '{section/tags.united_states}'
))
AS _obs_getnumerators_total_pop_by_section;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
section_tags => '{section/tags.ca}'
))
AS _obs_getnumerators_total_pop_not_in_canada;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
section_tags => '{section/tags.united_states}',
subsection_tags => '{subsection/tags.age_gender}'
))
AS _obs_getnumerators_total_pop_by_subsection;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
section_tags => '{section/tags.united_states}',
subsection_tags => '{subsection/tags.employment}'
))
AS _obs_getnumerators_total_pop_not_in_employment_subsection;
SELECT 'us.census.acs.B01003001' IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
ids => '{us.census.acs.B01003001}'
))
AS _obs_getnumerators_total_pop_by_id;
SELECT 'us.census.acs.B01003001' NOT IN (SELECT numer_id
FROM cdb_observatory._OBS_GetNumerators(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326),
ids => '{us.census.acs.B01003002}'
))
AS _obs_getnumerators_total_pop_not_with_other_id;
--
-- OBS_GetAvailableDenominators tests
--

View File

@@ -1,5 +1,4 @@
from nose.tools import assert_equal, assert_is_not_none
from nose.plugins.skip import SkipTest
from nose_parameterized import parameterized
from itertools import izip_longest
@@ -55,84 +54,50 @@ SKIP_COLUMNS = set([
u'us.census.tiger.mtfcc',
u'whosonfirst.wof_county_name',
u'whosonfirst.wof_region_name',
'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'
, 'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'
, 'uk.ons.LC3202WA0007'
, 'uk.ons.LC3202WA0010'
, 'uk.ons.LC3202WA0004'
, 'uk.ons.LC3204WA0004'
, 'uk.ons.LC3204WA0007'
, 'uk.ons.LC3204WA0010'
, 'br.geo.subdistritos_name'
u'fr.insee.P12_RP_CHOS',
u'fr.insee.P12_RP_HABFOR',
u'fr.insee.P12_RP_EAUCH',
u'fr.insee.P12_RP_BDWC',
u'fr.insee.P12_RP_MIDUR',
u'fr.insee.P12_RP_CLIM',
u'fr.insee.P12_RP_MIBOIS',
u'fr.insee.P12_RP_CASE',
u'fr.insee.P12_RP_TTEGOU',
u'fr.insee.P12_RP_ELEC',
u'fr.insee.P12_ACTOCC15P_ILT45D',
u'fr.insee.P12_RP_CHOS',
u'fr.insee.P12_RP_HABFOR',
u'fr.insee.P12_RP_EAUCH',
u'fr.insee.P12_RP_BDWC',
u'fr.insee.P12_RP_MIDUR',
u'fr.insee.P12_RP_CLIM',
u'fr.insee.P12_RP_MIBOIS',
u'fr.insee.P12_RP_CASE',
u'fr.insee.P12_RP_TTEGOU',
u'fr.insee.P12_RP_ELEC',
u'fr.insee.P12_ACTOCC15P_ILT45D',
u'uk.ons.LC3202WA0007',
u'uk.ons.LC3202WA0010',
u'uk.ons.LC3202WA0004',
u'uk.ons.LC3204WA0004',
u'uk.ons.LC3204WA0007',
u'uk.ons.LC3204WA0010',
u'br.geo.subdistritos_name'
])
MEASURE_COLUMNS = query('''
SELECT ARRAY_AGG(DISTINCT numer_id) numer_ids,
SELECT cdb_observatory.FIRST(distinct numer_id) numer_ids,
numer_aggregate,
denom_reltype,
section_tags
denom_reltype
FROM observatory.obs_meta
WHERE numer_weight > 0
AND numer_id NOT IN ('{skip}')
AND numer_id NOT LIKE 'eu.%' --Skipping Eurostat
AND section_tags IS NOT NULL
AND subsection_tags IS NOT NULL
GROUP BY numer_aggregate, section_tags, denom_reltype
GROUP BY numer_id, numer_aggregate, denom_reltype
'''.format(skip="', '".join(SKIP_COLUMNS))).fetchall()
#CATEGORY_COLUMNS = query('''
#SELECT distinct numer_id
#FROM observatory.obs_meta
#WHERE numer_type ILIKE 'text'
#AND numer_weight > 0
#''').fetchall()
#
#BOUNDARY_COLUMNS = query('''
#SELECT id FROM observatory.obs_column
#WHERE type ILIKE 'geometry'
#AND weight > 0
#''').fetchall()
#
#US_CENSUS_MEASURE_COLUMNS = query('''
#SELECT distinct numer_name
#FROM observatory.obs_meta
#WHERE numer_type ILIKE 'numeric'
#AND 'us.census.acs' = ANY (subsection_tags)
#AND numer_weight > 0
#''').fetchall()
#def default_geometry_id(column_id):
# '''
# Returns default test point for the column_id.
# '''
# if column_id == 'whosonfirst.wof_disputed_geom':
# return 'ST_SetSRID(ST_MakePoint(76.57, 33.78), 4326)'
# elif column_id == 'whosonfirst.wof_marinearea_geom':
# return 'ST_SetSRID(ST_MakePoint(-68.47, 43.33), 4326)'
# elif column_id in ('us.census.tiger.school_district_elementary',
# 'us.census.tiger.school_district_secondary',
# 'us.census.tiger.school_district_elementary_clipped',
# 'us.census.tiger.school_district_secondary_clipped'):
# return 'ST_SetSRID(ST_MakePoint(-73.7067, 40.7025), 4326)'
# elif column_id.startswith('es.ine'):
# return 'ST_SetSRID(ST_MakePoint(-2.51141249535454, 42.8226119029222), 4326)'
# elif column_id.startswith('us.zillow'):
# return 'ST_SetSRID(ST_MakePoint(-81.3544048197256, 28.3305906291771), 4326)'
# elif column_id.startswith('ca.'):
# return ''
# else:
# return 'ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)'
def default_lonlat(column_id):
'''
@@ -142,11 +107,6 @@ def default_lonlat(column_id):
return (76.57, 33.78)
elif column_id == 'whosonfirst.wof_marinearea_geom':
return (-68.47, 43.33)
elif column_id in ('us.census.tiger.school_district_elementary',
'us.census.tiger.school_district_secondary',
'us.census.tiger.school_district_elementary_clipped',
'us.census.tiger.school_district_secondary_clipped'):
return (40.7025, -73.7067)
elif column_id.startswith('uk'):
if 'WA' in column_id:
return (51.46844551219723, -3.184833526611328)
@@ -158,30 +118,19 @@ def default_lonlat(column_id):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('mx.'):
return (19.41347699386547, -99.17019367218018)
elif column_id.startswith('th.'):
return (13.725377712079784, 100.49263000488281)
# cols for French Guyana only
#elif column_id in ('fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
# , 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
# , 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
# , 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
# , 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
# , 'fr.insee.P12_ACTOCC15P_ILT45D'
# , 'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
# , 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
# , 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
# , 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
# , 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
# , 'fr.insee.P12_ACTOCC15P_ILT45D'):
# return (4.938408371206558, -52.32908248901367)
elif column_id.startswith('fr.'):
return (48.860875144709475, 2.3613739013671875)
elif column_id.startswith('ca.'):
return (43.65594991256823, -79.37965393066406)
elif column_id in ('us.census.tiger.school_district_elementary',
'us.census.tiger.school_district_secondary',
'us.census.tiger.school_district_elementary_clipped',
'us.census.tiger.school_district_secondary_clipped',
'us.census.tiger.school_district_elementary_geoname',
'us.census.tiger.school_district_secondary_geoname'):
return (40.7025, -73.7067)
elif column_id.startswith('us.census.'):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.dma.'):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.ihme.'):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.bls.'):
@@ -192,8 +141,6 @@ def default_lonlat(column_id):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.epa.'):
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('eu.'):
raise SkipTest('No tests for Eurostat!')
elif column_id.startswith('br.'):
return (-23.53, -46.63)
elif column_id.startswith('au.'):
@@ -202,56 +149,65 @@ def default_lonlat(column_id):
raise Exception('No catalog point set for {}'.format(column_id))
def default_point(column_id):
lat, lng = default_lonlat(column_id)
def default_point(test_point):
lat, lng = test_point
return 'ST_SetSRID(ST_MakePoint({lng}, {lat}), 4326)'.format(
lat=lat, lng=lng)
def default_area(column_id):
def default_area(test_point):
'''
Returns default test area for the column_id
'''
point = default_point(column_id)
point = default_point(test_point)
area = 'ST_Transform(ST_Buffer(ST_Transform({point}, 3857), 250), 4326)'.format(
point=point)
return area
#@parameterized(US_CENSUS_MEASURE_COLUMNS)
#def test_get_us_census_measure_points(name):
# resp = query('''
#SELECT * FROM {schema}OBS_GetUSCensusMeasure({point}, '{name}')
# '''.format(name=name.replace("'", "''"),
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point('')))
# rows = resp.fetchall()
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
def filter_points():
return MEASURE_COLUMNS
def grouped_measure_columns():
for numer_ids, numer_aggregate, denom_reltype, section_tags in MEASURE_COLUMNS:
def filter_areas():
filtered = []
for numer_ids, numer_aggregate, denom_reltype in MEASURE_COLUMNS:
if numer_aggregate is None or numer_aggregate.lower() not in ('sum', 'median', 'average'):
continue
if numer_aggregate.lower() in ('median', 'average') \
and (denom_reltype is None or denom_reltype.lower() != 'universe'):
continue
filtered.append((numer_ids, numer_aggregate, denom_reltype))
return filtered
def grouped_measure_columns(filtered_columns):
groupbypoint = dict()
for row in filtered_columns:
numer_ids = row[0]
point = default_lonlat(numer_ids)
if point in groupbypoint:
groupbypoint[point].append(numer_ids)
else:
groupbypoint[point] = [numer_ids]
for point, numer_ids in groupbypoint.iteritems():
for colgroup in grouper(numer_ids, 50):
yield [c for c in colgroup if c], numer_aggregate, denom_reltype, section_tags
yield point, [c for c in colgroup if c]
@parameterized(grouped_measure_columns())
def test_get_measure_points(numer_ids, numer_aggregate, denom_reltype, section_tags):
_test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, default_point(numer_ids[0]))
@parameterized(grouped_measure_columns(filter_points()))
def test_get_measure_points(point, numer_ids):
_test_measures(numer_ids, default_point(point))
@parameterized(grouped_measure_columns())
def test_get_measure_areas(numer_ids, numer_aggregate, denom_reltype, section_tags):
if numer_aggregate is None or numer_aggregate.lower() not in ('sum', 'median', 'average'):
return
if numer_aggregate.lower() in ('median', 'average') \
and (denom_reltype is None \
or denom_reltype.lower() != 'universe'):
return
_test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, default_area(numer_ids[0]))
@parameterized(grouped_measure_columns(filter_areas()))
def test_get_measure_areas(point, numer_ids):
_test_measures(numer_ids, default_area(point))
def _test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, geom):
def _test_measures(numer_ids, geom):
in_params = []
for numer_id in numer_ids:
in_params.append({
@@ -284,90 +240,3 @@ def _test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, geom
assert_equal(len(vals), len(in_params))
for i, val in enumerate(vals):
assert_is_not_none(val, 'NULL for {}'.format(in_params[i]['numer_id']))
#@parameterized(CATEGORY_COLUMNS)
#def test_get_category_areas(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetCategory({area}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# area=default_area(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(CATEGORY_COLUMNS)
#def test_get_category_points(column_id):
# if column_id in SKIP_COLUMNS:
# raise SkipTest('Column {} should be skipped'.format(column_id))
# resp = query('''
#SELECT * FROM {schema}OBS_GetCategory({point}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point(column_id)))
# rows = resp.fetchall()
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_boundaries_by_geometry(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetBoundariesByGeometry({area}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# area=default_area(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_points_by_geometry(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetPointsByGeometry({area}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# area=default_area(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_boundary_points(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetBoundary({point}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_boundary_id(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetBoundaryId({point}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_boundary_by_id(column_id):
# resp = query('''
#SELECT * FROM {schema}OBS_GetBoundaryById({geometry_id}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# geometry_id=default_geometry_id(column_id)))
# assert_equal(resp.status_code, 200)
# rows = resp.json()['rows']
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])

View File

@@ -44,33 +44,7 @@ for q in (
-73.81885528564453,41.745696344339564, 4326),
'us.census.tiger.county_clipped')) foo
ORDER BY ST_NPoints(the_geom) DESC
LIMIT 50;''',
'DROP TABLE IF EXISTS obs_perftest_country_simple',
'''CREATE TABLE obs_perftest_country_simple (cartodb_id SERIAL PRIMARY KEY,
geom GEOMETRY,
name TEXT) ''',
'''INSERT INTO obs_perftest_country_simple (geom, name)
SELECT the_geom geom,
geom_refs AS name
FROM (SELECT * FROM {schema}OBS_GetBoundariesByGeometry(
st_makeenvelope(-179,-89, 179,89, 4326),
'whosonfirst.wof_country_geom')) foo
ORDER BY ST_NPoints(the_geom) ASC
LIMIT 50;''',
'DROP TABLE IF EXISTS obs_perftest_country_complex',
'''CREATE TABLE obs_perftest_country_complex (cartodb_id SERIAL PRIMARY KEY,
geom GEOMETRY,
name TEXT) ''',
'''INSERT INTO obs_perftest_country_complex (geom, name)
SELECT the_geom geom,
geom_refs AS name
FROM (SELECT * FROM {schema}OBS_GetBoundariesByGeometry(
st_makeenvelope(-179,-89, 179,89, 4326),
'whosonfirst.wof_country_geom')) foo
ORDER BY ST_NPoints(the_geom) DESC
LIMIT 50;''',
#'''SET statement_timeout = 5000;'''
):
LIMIT 50;'''):
q_formatted = q.format(
schema='cdb_observatory.' if USE_SCHEMA else '',
)
@@ -118,15 +92,7 @@ def record(params, results):
('complex', '_OBS_GetGeometryScores', 'NULL', 1),
('complex', '_OBS_GetGeometryScores', 'NULL', 500),
('complex', '_OBS_GetGeometryScores', 'NULL', 3000),
('country_simple', '_OBS_GetGeometryScores', 'NULL', 1),
('country_simple', '_OBS_GetGeometryScores', 'NULL', 500),
('country_simple', '_OBS_GetGeometryScores', 'NULL', 5000),
('country_complex', '_OBS_GetGeometryScores', 'NULL', 1),
('country_complex', '_OBS_GetGeometryScores', 'NULL', 500),
('country_complex', '_OBS_GetGeometryScores', 'NULL', 5000),
('complex', '_OBS_GetGeometryScores', 'NULL', 3000)
])
def test_getgeometryscores_performance(geom_complexity, api_method, filters, target_geoms):
print api_method, geom_complexity, filters, target_geoms