50 Commits
1.3.2 ... 1.3.4

Author SHA1 Message Date
John Krauss
f1bf4259bc release artifact 1.3.4 2017-03-10 20:17:22 +00:00
John Krauss
a2609d9d07 update NEWS for 1.3.4 2017-03-10 20:14:32 +00:00
John Krauss
01779991bb Remove erroneously commited NOTICE 2017-03-10 20:13:27 +00:00
John Krauss
ec53d354e9 release 1.3.3 artifact 2017-03-10 19:48:23 +00:00
John Krauss
c1aa91da5b update NEWS.md 2017-03-10 19:36:00 +00:00
John Krauss
93ebd9aa0f test getdata across multiple input columns; remove dead code from autotest 2017-03-10 19:27:06 +00:00
John Krauss
4a29c060ef fix unittest bug, easier to read use of unnest, static geomvals when one passed in 2017-03-10 19:18:06 +00:00
John Krauss
1639bea74a mark relevant functions STABLE 2017-03-10 18:36:51 +00:00
John Krauss
765cbfcccc only do polygon operations when polygons passed in 2017-03-10 16:32:31 +00:00
John Krauss
c4f3c5d534 selectively pass through obs geometries and area calcs 2017-03-10 16:23:27 +00:00
John Krauss
d5e7d95824 fix performance regression on getboundariesbygeometry, where pct overlap was being unnecessarily calculated 2017-03-09 21:26:53 +00:00
John Krauss
3ff1b36d7f remove erroneous NOTICE 2017-03-09 20:46:38 +00:00
John Krauss
c28cdeb767 Merge branch 'release-v-1.3.3' into separate-geom-from-data-calcs 2017-03-09 18:09:45 +00:00
John Krauss
b1d672bfe4 Merge branch 'release-v-1.3.3' into faster-autotest 2017-03-09 18:07:28 +00:00
John Krauss
524d477f7b Merge remote-tracking branch 'origin/release-v-1.3.3' into release-v-1.3.3 2017-03-09 17:59:49 +00:00
John Krauss
7ef035580f avoid geom calculation when points are passed in 2017-03-09 17:29:41 +00:00
John Krauss
20b347528c tests passing 2017-03-09 17:14:20 +00:00
John Krauss
d070802f53 resolving API bugs 2017-03-09 16:17:58 +00:00
John Krauss
751f470049 Merge branch 'faster-autotest' into separate-geom-from-data-calcs 2017-03-09 14:58:26 +00:00
John Krauss
a1b5f01d57 Merge remote-tracking branch 'origin/develop' into faster-autotest 2017-03-09 14:50:48 +00:00
John Krauss
f2d2b32bf1 Merge remote-tracking branch 'origin/develop' into separate-geom-from-data-calcs 2017-03-09 14:50:01 +00:00
csobier
02413eb974 line 412, bad tag 2017-03-09 08:15:43 -05:00
csobier
1a4a2edbc6 lien 207 2017-03-09 08:10:23 -05:00
csobier
47c6453bbc tags 2017-03-09 08:05:59 -05:00
csobier
5f2daad408 wrong quotes around context, breaking docs 2017-03-09 07:36:10 -05:00
csobier
764a1ce7cd highlight missing, breaking docs 2017-03-09 07:29:19 -05:00
csobier
12235c7138 missing tag on line 387, breaking docs. 2017-03-09 07:16:49 -05:00
John Krauss
3f817f8e9a bugfixes, most unit tests passing 2017-03-09 05:03:25 +00:00
John Krauss
5ca2664a17 first pass much faster multicolumn getdata via precalcs 2017-03-09 04:12:38 +00:00
John Krauss
1b913c77c4 fix last oustanding bug with autotest 2017-03-08 23:18:07 +00:00
John Krauss
22eb6349c2 fix issues with python autotest failing for nulls, try removing case statements around geometries in getdata 2017-03-08 21:17:45 +00:00
John Krauss
862db2c33a Merge remote-tracking branch 'origin/release-v-1.3.3' into faster-autotest
Conflicts:
	src/pg/sql/40_observatory_utility.sql
2017-03-08 20:52:31 +00:00
John Krauss
e2f92d78cf much faster autotest by grouping in getdata, fixes to getdata to prevent hangs 2017-03-08 20:51:41 +00:00
john krauss
3df1ffc3c8 Merge pull request #265 from CartoDB/check-intersection-errors
Resolve intersection errors
2017-03-08 15:38:39 -05:00
John Krauss
6a60cfc417 Merge branch 'develop' into faster-autotest 2017-03-08 15:57:14 +00:00
John Krauss
3b6b1b4843 limit safe_intersection to SRID 4326, DRY out ST_MakeValid 2017-03-08 15:52:19 +00:00
John Krauss
460059f2cf Merge branch 'develop' into check-intersection-errors 2017-03-07 20:45:05 +00:00
John Krauss
fc111dd1e2 Merge branch 'obs-getavailableX-docs' into develop 2017-03-07 20:39:40 +00:00
John Krauss
7cbef7e1b5 Merge branch 'obs-getdata-getmeta-docs' into develop 2017-03-07 20:39:25 +00:00
John Krauss
deede798e9 fix non-noded intersection between shoreline clipped and non-shoreline clipped geometries by using a safe_intersection function 2017-03-07 20:38:12 +00:00
John Krauss
fd3918b29c fix divide-by-zero errors 2017-03-07 16:45:15 +00:00
John Krauss
cdf7b17a4d tmp commit 2017-03-07 15:29:09 +00:00
John Krauss
63ae7c1392 add obs_getavailableX metadata API docs 2017-02-28 21:33:06 +00:00
John Krauss
af671931d4 integrate michelles comments 2017-02-23 20:12:27 +00:00
Michelle Ho
8120081d68 Typo fix
Typo fix of "measured" to "measure"
2017-02-06 16:37:27 -05:00
Michelle Ho
72ced1a7a7 Change 'raise' to 'raises'
Changes semantic meaning-- user does not raise the error, CARTO raises the error
2017-02-06 16:27:43 -05:00
Michelle Ho
d15b74a594 Change ``OBS_GetUSCensusMeasure`` 2017-02-06 16:18:56 -05:00
Michelle Ho
60ab773549 change point to polygon in GetUSCensusMeasure 2017-02-06 15:57:49 -05:00
Michelle Ho
01b70dd06e proof-reading changes 2017-02-06 14:58:07 -05:00
John Krauss
4b409cc9f4 first-pass docs for obs_getdata and obs_getmeta 2017-02-01 09:12:18 -05:00
19 changed files with 5807 additions and 356 deletions

28
NEWS.md
View File

@@ -1,3 +1,31 @@
1.3.4 (2017-03-10)
__Bugfixes__
* Remove erroneously committed `RAISE NOTICE` in `OBS_GetData`
1.3.3 (2017-03-10)
__Bugfixes__
* Resolve divide-by-zero errors in cases where the intersection of an
Observatory geometry and user geometry has 0 area
([#265](https://github.com/CartoDB/observatory-extension/pull/265))
* Run MakeValid on geometry's when intersecting, if necessary
([#268](https://github.com/CartoDB/observatory-extension/pull/268))
__Improvements__
* Add performance tests for multiple columns in `OBS_GetData`
* Major performance boost for `autotest.py` through the use of multi-column
`OBS_GetData` instead of separate `OBS_GetMeasure` calls for every single
measurement.
([#268](https://github.com/CartoDB/observatory-extension/pull/268))
* Major performance boost for `OBS_GetData` in cases where multiple columns are
requested. Previously, each additional column would result in a linear
slowdown, even if geometries could be reused.
([#267](https://github.com/CartoDB/observatory-extension/pull/267))
1.3.2 (2017-03-02)
__Bugfixes__

View File

@@ -56,3 +56,306 @@ time_span | the timespan attached the boundary. this does not mean that the boun
```SQL
SELECT * FROM OBS_GetAvailableBoundaries(CDB_LatLng(40.7, -73.9))
```
## OBS_GetAvailableNumerators(bounds, filter_tags, denom_id, geom_id, timespan)
Return available numerators within a boundary and with the specified
`filter_tags`.
#### Arguments
Name | Type | Description
--- | --- | ---
bounds | Geometry(Geometry, 4326) | a geometry which some of the numerator's data must intersect with
filter_tags | Text[] | a list of filters. Only numerators for which all of these apply are returned `NULL` to ignore (optional)
denom_id | Text | the ID of a denominator to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional)
geom_id | Text | the ID of a geometry to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional)
timespan | Text | the ID of a timespan to check whether the numerator is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional)
#### Returns
A TABLE containing the following properties
Key | Type | Description
--- | ---- | -----------
numer_id | Text | The ID of the numerator
numer_name | Text | A human readable name for the numerator
numer_description | Text | Description of the numerator. Is sometimes NULL
numer_weight | Numeric | Numeric "weight" of the numerator. Ignored.
numer_license | Text | ID of the license for the numerator
numer_source | Text | ID of the source for the numerator
numer_type | Text | Postgres type of the numerator
numer_aggregate | Text | Aggregate type of the numerator. If `'SUM'`, this can be normalized by area
numer_extra | JSONB | Extra information about the numerator column. Ignored.
numer_tags | Text[] | Array of all tags applying to this numerator
valid_denom | Boolean | True if the `denom_id` argument is a valid denominator for this numerator, False otherwise
valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this numerator, False otherwise
valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this numerator, False otherwise
#### Examples
Obtain all numerators that are available within a small rectangle.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326))
```
Obtain all numerators that are available within a small rectangle and are for
the United States only.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}');
```
Obtain all numerators that are available within a small rectangle and are
employment related for the United States only.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states, subsection/tags.employment}');
```
Obtain all numerators that are available within a small rectangle and are
related to both employment and age & gender for the United States only.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states, subsection/tags.employment, subsection/tags.age_gender}');
```
Obtain all numerators that work with US population (`us.census.acs.B01003001`)
as a denominator.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001')
WHERE valid_denom IS True;
```
Obtain all numerators that work with US states (`us.census.tiger.state`)
as a geometry.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, 'us.census.tiger.state')
WHERE valid_geom IS True;
```
Obtain all numerators available in the timespan `2011 - 2015`.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableNumerators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2011 - 2015')
WHERE valid_timespan IS True;
```
## OBS_GetAvailableDenominators(bounds, filter_tags, numer_id, geom_id, timespan)
Return available denominators within a boundary and with the specified
`filter_tags`.
#### Arguments
Name | Type | Description
--- | --- | ---
bounds | Geometry(Geometry, 4326) | a geometry which some of the denominator's data must intersect with
filter_tags | Text[] | a list of filters. Only denominators for which all of these apply are returned `NULL` to ignore (optional)
numer_id | Text | the ID of a numerator to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional)
geom_id | Text | the ID of a geometry to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional)
timespan | Text | the ID of a timespan to check whether the denominator is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional)
#### Returns
A TABLE containing the following properties
Key | Type | Description
--- | ---- | -----------
denom_id | Text | The ID of the denominator
denom_name | Text | A human readable name for the denominator
denom_description | Text | Description of the denominator. Is sometimes NULL
denom_weight | Numeric | Numeric "weight" of the denominator. Ignored.
denom_license | Text | ID of the license for the denominator
denom_source | Text | ID of the source for the denominator
denom_type | Text | Postgres type of the denominator
denom_aggregate | Text | Aggregate type of the denominator. If `'SUM'`, this can be normalized by area
denom_extra | JSONB | Extra information about the denominator column. Ignored.
denom_tags | Text[] | Array of all tags applying to this denominator
valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this denominator, False otherwise
valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this denominator, False otherwise
valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this denominator, False otherwise
#### Examples
Obtain all denominators that are available within a small rectangle.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326));
```
Obtain all denominators that are available within a small rectangle and are for
the United States only.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}');
```
Obtain all denominators for male population (`us.census.acs.B01001002`).
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01001002')
WHERE valid_numer IS True;
```
Obtain all denominators that work with US states (`us.census.tiger.state`)
as a geometry.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, 'us.census.tiger.state')
WHERE valid_geom IS True;
```
Obtain all denominators available in the timespan `2011 - 2015`.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableDenominators(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2011 - 2015')
WHERE valid_timespan IS True;
```
## OBS_GetAvailableGeometries(bounds, filter_tags, numer_id, denom_id, timespan)
Return available geometries within a boundary and with the specified
`filter_tags`.
#### Arguments
Name | Type | Description
--- | --- | ---
bounds | Geometry(Geometry, 4326) | a geometry which must intersect the geometry
filter_tags | Text[] | a list of filters. Only geometries for which all of these apply are returned `NULL` to ignore (optional)
numer_id | Text | the ID of a numerator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional)
denom_id | Text | the ID of a denominator to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional)
timespan | Text | the ID of a timespan to check whether the geometry is valid against. Will not reduce length of returned table, but will change values for `valid_timespan` (optional)
#### Returns
A TABLE containing the following properties
Key | Type | Description
--- | ---- | -----------
geom_id | Text | The ID of the geometry
geom_name | Text | A human readable name for the geometry
geom_description | Text | Description of the geometry. Is sometimes NULL
geom_weight | Numeric | Numeric "weight" of the geometry. Ignored.
geom_aggregate | Text | Aggregate type of the geometry. Ignored.
geom_license | Text | ID of the license for the geometry
geom_source | Text | ID of the source for the geometry
geom_type | Text | Postgres type of the geometry
geom_extra | JSONB | Extra information about the geometry column. Ignored.
geom_tags | Text[] | Array of all tags applying to this geometry
valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this geometry, False otherwise
valid_denom | Boolean | True if the `geom_id` argument is a valid geometry for this geometry, False otherwise
valid_timespan | Boolean | True if the `timespan` argument is a valid timespan for this geometry, False otherwise
score | Numeric | Score between 0 and 100 for this geometry, higher numbers mean that this geometry is a better choice for the passed extent
numtiles | Numeric | How many raster tiles were read for score, numgeoms, and percentfill estimates
numgeoms | Numeric | About how many of these geometries fit inside the passed extent
percentfill | Numeric | About what percentage of the passed extent is filled with these geometries
estnumgeoms | Numeric | Ignored
meanmediansize | Numeric | Ignored
#### Examples
Obtain all geometries that are available within a small rectangle.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries(
ST_MakeEnvelope(-74, 41, -73, 40, 4326));
```
Obtain all geometries that are available within a small rectangle and are for
the United States only.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), '{section/tags.united_states}');
```
Obtain all geometries that work with total population (`us.census.acs.B01003001`).
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001')
WHERE valid_numer IS True;
```
Obtain all geometries with timespan `2015`.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableGeometries(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, '2015')
WHERE valid_timespan IS True;
```
## OBS_GetAvailableTimespans(bounds, filter_tags, numer_id, denom_id, geom_id)
Return available timespans within a boundary and with the specified
`filter_tags`.
#### Arguments
Name | Type | Description
--- | --- | ---
bounds | Geometry(Geometry, 4326) | a geometry which some of the timespan's data must intersect with
filter_tags | Text[] | a list of filters. Ignore
numer_id | Text | the ID of a numerator to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_numer` (optional)
denom_id | Text | the ID of a denominator to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_denom` (optional)
geom_id | Text | the ID of a geometry to check whether the timespans is valid against. Will not reduce length of returned table, but will change values for `valid_geom` (optional)
#### Returns
A TABLE containing the following properties
Key | Type | Description
--- | ---- | -----------
timespan_id | Text | The ID of the timespan
timespan_name | Text | A human readable name for the timespan
timespan_description | Text | Ignored
timespan_weight | Numeric | Ignored
timespan_license | Text | Ignored
timespan_source | Text | Ignored
timespan_aggregate | Text | Ignored
valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this timespan, False otherwise
valid_denom | Boolean | True if the `timespan` argument is a valid timespan for this timespan, False otherwise
valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this timespan, False otherwise
#### Examples
Obtain all timespans that are available within a small rectangle.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans(
ST_MakeEnvelope(-74, 41, -73, 40, 4326));
```
Obtain all timespans for total population (`us.census.acs.B01003001`).
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, 'us.census.acs.B01003001')
WHERE valid_numer IS True;
```
Obtain all timespans that work with US states (`us.census.tiger.state`)
as a geometry.
```SQL
SELECT * FROM cdb_observatory.OBS_GetAvailableTimespans(
ST_MakeEnvelope(-74, 41, -73, 40, 4326), NULL, NULL, NULL, 'us.census.tiger.state')
WHERE valid_geom IS True;
```

View File

@@ -8,15 +8,15 @@ You can [access](https://carto.com/docs/carto-engine/data/accessing) measures th
## OBS_GetUSCensusMeasure(point geometry, measure_name text)
The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables at a point location. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory, to access the full list, use measure IDs with the ```OBS_GetMeasure``` function below.
The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables at a point location. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use measure IDs with the ```OBS_GetMeasure``` function below.
#### Arguments
Name |Description
--- | ---
point | a WGS84 point geometry (the_geom)
measure_name | a human readable name of a US Census variable. The list of measure_names is [available in the Glossary](https://carto.com/docs/carto-engine/data/glossary/#obsgetuscensusmeasure-names-table).
normalize | for measures that are are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional)
measure_name | a human-readable name of a US Census variable. The list of measure_names is [available in the Glossary](https://carto.com/docs/carto-engine/data/glossary/#obsgetuscensusmeasure-names-table).
normalize | for measures that are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional)
boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract')
time_span | time span of interest (e.g., 2010 - 2014)
@@ -39,7 +39,7 @@ SET total_population = OBS_GetUSCensusMeasure(the_geom, 'Total Population')
## OBS_GetUSCensusMeasure(polygon geometry, measure_name text)
The ```OBS_GetUSCensusMeasure(point, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory, to access the full list, use the ```OBS_GetUSCensusMeasure``` function below.
The ```OBS_GetUSCensusMeasure(polygon, measure_name)``` function returns a measure based on a subset of the US Census variables within a given polygon. The ```OBS_GetUSCensusMeasure``` function is limited to only a subset of all measures that are available in the Data Observatory. To access the full list, use the ```OBS_GetMeasure``` function below.
#### Arguments
@@ -78,7 +78,7 @@ Name |Description
--- | ---
point | a WGS84 point geometry (the_geom)
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf)). It is important to note that these are different than 'measure_name' used in the Census based functions above.
normalize | for measures that are are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. The other option is 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html). (optional)
normalize | for measures that are **sums** (e.g. population) the default normalization is 'area' and response comes back as a rate per square kilometer. The other option is 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html). (optional)
boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract')
time_span | time span of interest (e.g., 2010 - 2014)
@@ -109,7 +109,7 @@ Name |Description
--- | ---
polygon_geometry | a WGS84 polygon geometry (the_geom)
measure_id | a measure identifier from the Data Observatory ([see available measures](https://cartodb.github.io/bigmetadata/observatory.pdf))
normalize | for measures that are are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional)
normalize | for measures that are **sums** (e.g. population) the default normalization is 'none' and response comes back as a raw value. Other options are 'denominator', which will use the denominator specified in the [Data Catalog](https://cartodb.github.io/bigmetadata/index.html) (optional)
boundary_id | source of geometries to pull measure from (e.g., 'us.census.tiger.census_tract')
time_span | time span of interest (e.g., 2010 - 2014)
@@ -132,7 +132,7 @@ SET household_count = OBS_GetMeasure(the_geom, 'us.census.acs.B11001001')
#### Errors
* If an unrecognized normalization type is input, raise an error: `'Only valid inputs for "normalize" are "area" (default) and "denominator".`
* If an unrecognized normalization type is input, raises error: `'Only valid inputs for "normalize" are "area" (default) and "denominator".`
## OBS_GetMeasureById(geom_ref text, measure_id text, boundary_id text)
@@ -195,3 +195,285 @@ Add the Category to an empty column text column based on point locations in your
UPDATE tablename
SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55')
```
## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_boundary_score_rank, num_target_geoms)
The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data
Observatory metadata JSON Object for use in ```OBS_GetData(geomvals,
metadata)``` or ```OBS_GetData(ids, metadata)```. It is not possible to pass
metadata to those functions if it is not processed by ```OBS_GetMeta(extent,
metadata)``` first.
`OBS_GetMeta` makes it possible to automatically select appropriate timespans
and boundaries for the measurement you want.
#### Arguments
Name | Description
---- | -----------
extent | A geometry of the extent of the input geometries
metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column
max_timespan_rank | How many historical time periods to include. Defaults to 1
max_boundary_score_rank | How many alternative boundary levels to include. Defaults to 1
num_target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
The schema of the metadata input objects are as follows:
Metadata Input Key | Description
--- | -----------
numer_id | The identifier for the desired measurement. If left blank, but a `geom_id` is specified, the column will return a geometry instead of a measurement.
geom_id | Identifier for a desired geographic boundary level to use when calculating measures. Will be automatically assigned if undefined. If defined but `numer_id` is blank, then the column will return a geometry instead of a measurement.
normalization | The desired normalization. One of 'area', 'prenormalized', or 'denominated'. 'Area' will normalize the measure per square kilometer, 'prenormalized' will return the original value, and 'denominated' will normalize by a denominator. Ignored if this metadata object specifies a geometry.
denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry.
numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified.
geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified.
#### Returns
A JSON array composed of metadata output objects.
Key | Description
--- | -----------
meta | A JSON array with completed metadata for the requested data, including all keys below
The schema of the metadata output objects are as follows. You should pass this
array as-is to ```OBS_GetData```. If you modify any values the function will
fail.
Metadata Output Key | Description
--- | -----------
numer_id | Identifier for desired measurement
numer_timespan | Timespan that will be used of the desired measurement
numer_name | Human-readable name of desired measure
numer_type | PostgreSQL/PostGIS type of desired measure
numer_colname | Internal identifier for column name
numer_tablename | Internal identifier for table
numer_geomref_colname | Internal identifier for geomref column name
denom_id | Identifier for desired normalization
denom_timespan | Timespan that will be used of the desired normalization
denom_name | Human-readable name of desired measure's normalization
denom_type | PostgreSQL/PostGIS type of desired measure's normalization
denom_colname | Internal identifier for normalization column name
denom_tablename | Internal identifier for normalization table
denom_geomref_colname | Internal identifier for normalization geomref column name
geom_id | Identifier for desired boundary geometry
geom_timespan | Timespan that will be used of the desired boundary geometry
geom_name | Human-readable name of desired boundary geometry
geom_type | PostgreSQL/PostGIS type of desired boundary geometry
geom_colname | Internal identifier for boundary geometry column name
geom_tablename | Internal identifier for boundary geometry table
geom_geomref_colname | Internal identifier for boundary geometry ref column name
timespan_rank | Ranking of this measurement by time, most recent is 1, second most recent 2, etc.
score | The score of this measurement's boundary compared to the `extent` and `num_target_geoms` passed in. Between 0 and 100.
score_rank | The ranking of this measurement's boundary, highest ranked is 1, second is 2, etc.
numer_aggregate | The aggregate type of the numerator, either `sum`, `average`, `median`, or blank
denom_aggregate | The aggregate type of the denominator, either `sum`, `average`, `median`, or blank
normalization | The sort of normalization that will be used for this measure, either `area`, `predenominated`, or `denominated`
#### Examples
Obtain metadata that can augment with one additional column of US population
data, using a boundary relevant for the geometry provided and latest timespan.
Limit to only the most recent column most relevant to the extent & density of
input geometries in `tablename`.
```SQL
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001"}]',
1, 1,
COUNT(*)
) FROM tablename
```
Obtain metadata that can augment with one additional column of US population
data, using census tract boundaries.
```SQL
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract"}]',
1, 1,
COUNT(*)
) FROM tablename
```
Obtain metadata that can augment with two additional columns, one for total
population and one for male population.
```SQL
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001"}, {"numer_id": "us.census.acs.B01001002"}]',
1, 1,
COUNT(*)
) FROM tablename
```
## OBS_GetData(geomvals array[geomval], metadata json)
The ```OBS_GetData(geomvals, metadata)``` function returns a measure and/or
geometry corresponding to the `metadata` JSON array for each every Geometry of
the `geomval` element in the `geomvals` array. The metadata argument must be
obtained from ```OBS_GetMeta(extent, metadata)```.
#### Arguments
Name | Description
---- | -----------
geomvals | An array of `geomval` elements, which are obtained by casting together a `Geometry` and a `Numeric`. This should be obtained by using `ARRAY_AGG((the_geom, cartodb_id)::geomval)` from the CARTO table one wishes to obtain data for.
metadata | A JSON array composed of metadata output objects from ```OBS_GetMeta(extent, metadata)```. The schema of the elements of the `metadata` JSON array corresponds to that of the output of ```OBS_GetMeta(extent, metadata)```, and this argument must be obtained from that function in order for the call to be valid.
#### Returns
A TABLE with the following schema, where each element of the input `geomvals`
array corresponds to one row:
Column | Type | Description
------ | ---- | -----------
id | Numeric | ID corresponding to the `val` component of an element of the input `geomvals` array
data | JSON | A JSON array with elements corresponding to the input `metadata` JSON array
Each `data` object has the following keys:
Key | Description
--- | -----------
value | The value of the measurement or geometry for the geometry corresponding to this row and measurement corresponding to this position in the `metadata` JSON array
To determine the appropriate cast for `value`, one can use the `numer_type`
or `geom_type` key corresponding to that value in the input `metadata` JSON
array.
#### Examples
Obtain population densities for every geometry in a table, keyed by cartodb_id:
```SQL
WITH meta AS (
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001"}]',
1, 1, COUNT(*)
) meta FROM tablename)
SELECT id AS cartodb_id, (data->0->>'value')::Numeric AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename),
(SELECT meta FROM meta))
```
Update a table with a blank numeric column called `pop_density` with population
densities:
```SQL
WITH meta AS (
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001"}]',
1, 1, COUNT(*)
) meta FROM tablename),
data AS (
SELECT id AS cartodb_id, (data->0->>'value')::Numeric AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename),
(SELECT meta FROM meta)))
UPDATE tablename
SET pop_density = data.pop_density
FROM data
WHERE cartodb_id = data.id
```
Update a table with two measurements at once, population density and household
density. The table should already have a Numeric column `pop_density` and
`household_density`.
```SQL
WITH meta AS (
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom),4326),
'[{"numer_id": "us.census.acs.B01003001"},{"numer_id": "us.census.acs.B11001001"}]',
1, 1, COUNT(*)
) meta from tablename),
data AS (
SELECT id,
data->0->>'value' AS pop_density,
data->1->>'value' AS household_density
FROM OBS_GetData((SELECT ARRAY_AGG((the_geom, cartodb_id)::geomval) FROM tablename),
(SELECT meta FROM meta)))
UPDATE tablename
SET pop_density = data.pop_density,
household_density = data.household_density
FROM data
WHERE cartodb_id = data.id
```
## OBS_GetData(ids array[text], metadata json)
The ```OBS_GetData(ids, metadata)``` function returns a measure and/or
geometry corresponding to the `metadata` JSON array for each every id of
the `ids` array. The metadata argument must be obtained from
`OBS_GetMeta(extent, metadata)`. When obtaining metadata, one must include
the `geom_id` corresponding to the boundary that the `ids` refer to.
#### Arguments
Name | Description
---- | -----------
ids | An array of `TEXT` elements. This should be obtained by using `ARRAY_AGG(col_of_geom_refs)` from the CARTO table one wishes to obtain data for.
metadata | A JSON array composed of metadata output objects from ```OBS_GetMeta(extent, metadata)```. The schema of the elements of the `metadata` JSON array corresponds to that of the output of ```OBS_GetMeta(extent, metadata)```, and this argument must be obtained from that function in order for the call to be valid.
For this function to work, the `metadata` argument must include a `geom_id`
that corresponds to the ids found in `col_of_geom_refs`.
#### Returns
A TABLE with the following schema, where each element of the input `ids` array
corresponds to one row:
Column | Type | Description
------ | ---- | -----------
id | Text | ID corresponding to an element of the input `ids` array
data | JSON | A JSON array with elements corresponding to the input `metadata` JSON array
Each `data` object has the following keys:
Key | Description
--- | -----------
value | The value of the measurement or geometry for the geometry corresponding to this row and measurement corresponding to this position in the `metadata` JSON array
To determine the appropriate cast for `value`, one can use the `numer_type`
or `geom_type` key corresponding to that value in the input `metadata` JSON
array.
#### Examples
Obtain population densities for every row of a table with FIPS code county IDs
(USA).
```SQL
WITH meta AS (
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]'
) meta FROM tablename)
SELECT id AS fips, (data->0->>'value')::Numeric AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename),
(SELECT meta FROM meta))
```
Update a table with population densities for every FIPS code county ID (USA).
This table has a blank column called `pop_density` and fips codes stored in a
column `fips`.
```SQL
WITH meta AS (
SELECT OBS_GetMeta(
ST_SetSRID(ST_Extent(the_geom), 4326),
'[{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.county"}]'
) meta FROM tablename),
data as (
SELECT id AS fips, (data->0->>'value') AS pop_density
FROM OBS_GetData((SELECT ARRAY_AGG((fips) FROM tablename),
(SELECT meta FROM meta)))
UPDATE tablename
SET pop_density = data.pop_density
FROM data
WHERE fips = data.id
```

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.2'
default_version = '1.3.4'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -214,6 +214,7 @@ FIXTURES = [
('us.census.tiger.fullname', 'us.census.tiger.pointlm_geom', '2016'),
('us.census.tiger.fullname', 'us.census.tiger.prisecroads_geom', '2016'),
('us.census.tiger.name', 'us.census.tiger.county', '2015'),
('us.census.tiger.name', 'us.census.tiger.county_clipped', '2015'),
('us.census.tiger.name', 'us.census.tiger.block_group', '2015'),
]
@@ -358,7 +359,10 @@ def main():
dump('*', tablename, 'WHERE geom && ST_MakeEnvelope(-74,40.69,-73.9,40.72, 4326)')
continue
elif 'whosonfirst' in table_id:
where = '(\'85632785\',\'85633051\',\'85633111\',\'85633147\',\'85633253\',\'85633267\')'
where = "('85632785','85633051','85633111','85633147','85633253','85633267')"
compare = 'IN'
elif 'county' in table_id and 'tiger' in table_id:
where = "('48061', '36047')"
compare = 'IN'
else:
where = '\'36047%\''

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.2'
default_version = '1.3.4'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -231,3 +231,40 @@ CREATE AGGREGATE cdb_observatory.FIRST (
basetype = anyelement,
stype = anyelement
);
CREATE OR REPLACE FUNCTION cdb_observatory.isnumeric (
typename varchar
)
RETURNS BOOLEAN LANGUAGE SQL IMMUTABLE STRICT AS $$
SELECT LOWER(typename) IN (
'smallint',
'integer',
'bigint',
'decimal',
'numeric',
'real',
'double precision'
)
$$;
-- Attempt to perform intersection, if there's an exception then buffer
-- https://gis.stackexchange.com/questions/50399/how-best-to-fix-a-non-noded-intersection-problem-in-postgis
CREATE OR REPLACE FUNCTION cdb_observatory.safe_intersection(
geom_a Geometry(Geometry, 4326),
geom_b Geometry(Geometry, 4326)
)
RETURNS Geometry(Geometry, 4326) AS
$$
BEGIN
RETURN ST_MakeValid(ST_Intersection(geom_a, geom_b));
EXCEPTION
WHEN OTHERS THEN
BEGIN
RETURN ST_MakeValid(ST_Intersection(ST_Buffer(geom_a, 0.0000001), ST_Buffer(geom_b, 0.0000001)));
EXCEPTION
WHEN OTHERS THEN
RETURN NULL;
END;
END
$$
LANGUAGE 'plpgsql' STABLE STRICT;

View File

@@ -96,7 +96,7 @@ BEGIN
USING geom, meta
RETURN;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetMeta(
@@ -260,7 +260,7 @@ BEGIN
;
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetMeasure(
@@ -339,7 +339,7 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetMeasureById(
geom_ref TEXT,
@@ -374,7 +374,7 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
-- GetData that obtains data from array of geomrefs
@@ -434,10 +434,10 @@ BEGIN
'JSON_Build_Object(' || CASE
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'cdb_observatory.FIRST( ' ||
api_method || '.' || numer_colname || ')::' || numer_type
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN LOWER(numer_type) LIKE 'numeric' THEN
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%' OR (normalization IS NULL AND denom_id IS NOT NULL)
@@ -485,10 +485,16 @@ BEGIN
) tablenames_inner
) tablenames_outer) tablenames,
String_Agg(numer_tablename || '.' || numer_geomref_colname || ' = ' ||
geom_tablename || '.' || geom_geomref_colname ||
Coalesce(' AND ' || numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname, ''),
String_Agg(DISTINCT array_to_string(ARRAY[
CASE WHEN numer_tablename != geom_tablename
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
geom_tablename || '.' || geom_geomref_colname
ELSE NULL END,
CASE WHEN numer_tablename != denom_tablename
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname
ELSE NULL END
], ' AND '),
' AND ') AS obs_wheres,
String_Agg(geom_tablename || '.' || geom_geomref_colname || ' = ' ||
@@ -508,11 +514,14 @@ BEGIN
GROUP BY _geomrefs.id
ORDER BY _geomrefs.id
$query$, colspecs, tables,
'WHERE ' || NULLIF(ARRAY_TO_STRING(ARRAY[obs_wheres, user_wheres], ' AND '), ''))
'WHERE ' || NULLIF(ARRAY_TO_STRING(ARRAY[
Nullif(obs_wheres, ''), Nullif(user_wheres, '')
], ' AND '), '')
)
USING geomrefs;
RETURN;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
-- GetData that obtains data from array of (geom, id) geomvals.
@@ -527,172 +536,168 @@ RETURNS TABLE (
)
AS $$
DECLARE
colspecs TEXT;
geomrefs TEXT;
tables TEXT;
geom_colspecs TEXT;
geom_tables TEXT;
geomrefs_alias TEXT;
geomrefs_noalias TEXT;
data_colspecs TEXT;
data_tables TEXT;
obs_wheres TEXT;
user_wheres TEXT;
geomtype TEXT;
BEGIN
IF params IS NULL OR JSON_ARRAY_LENGTH(params) = 0 THEN
IF params IS NULL OR JSON_ARRAY_LENGTH(params) = 0 OR ARRAY_LENGTH(geomvals, 1) IS NULL THEN
RETURN QUERY EXECUTE $query$ SELECT NULL::INT, NULL::JSON LIMIT 0 $query$;
RETURN;
END IF;
geomtype := ST_GeometryType(geomvals[1].geom);
EXECUTE
$query$
WITH _meta AS (SELECT
generate_series(1, array_length($1, 1)) colid,
(unnest($1))->>'id' id,
(unnest($1))->>'numer_id' numer_id,
(unnest($1))->>'numer_aggregate' numer_aggregate,
(unnest($1))->>'numer_colname' numer_colname,
(unnest($1))->>'numer_geomref_colname' numer_geomref_colname,
(unnest($1))->>'numer_tablename' numer_tablename,
(unnest($1))->>'numer_type' numer_type,
(unnest($1))->>'denom_id' denom_id,
(unnest($1))->>'denom_aggregate' denom_aggregate,
(unnest($1))->>'denom_colname' denom_colname,
(unnest($1))->>'denom_geomref_colname' denom_geomref_colname,
(unnest($1))->>'denom_tablename' denom_tablename,
(unnest($1))->>'denom_type' denom_type,
(unnest($1))->>'denom_reltype' denom_reltype,
(unnest($1))->>'geom_id' geom_id,
(unnest($1))->>'geom_colname' geom_colname,
(unnest($1))->>'geom_geomref_colname' geom_geomref_colname,
(unnest($1))->>'geom_tablename' geom_tablename,
(unnest($1))->>'geom_type' geom_type,
(unnest($1))->>'numer_timespan' numer_timespan,
(unnest($1))->>'geom_timespan' geom_timespan,
(unnest($1))->>'normalization' normalization,
(unnest($1))->>'api_method' api_method,
(unnest($1))->'api_args' api_args
row_number() over () colid,
meta->>'id' id,
meta->>'numer_id' numer_id,
meta->>'numer_aggregate' numer_aggregate,
meta->>'numer_colname' numer_colname,
meta->>'numer_geomref_colname' numer_geomref_colname,
meta->>'numer_tablename' numer_tablename,
meta->>'numer_type' numer_type,
meta->>'denom_id' denom_id,
meta->>'denom_aggregate' denom_aggregate,
meta->>'denom_colname' denom_colname,
meta->>'denom_geomref_colname' denom_geomref_colname,
meta->>'denom_tablename' denom_tablename,
meta->>'denom_type' denom_type,
meta->>'denom_reltype' denom_reltype,
meta->>'geom_id' geom_id,
meta->>'geom_colname' geom_colname,
meta->>'geom_geomref_colname' geom_geomref_colname,
meta->>'geom_tablename' geom_tablename,
meta->>'geom_type' geom_type,
meta->>'numer_timespan' numer_timespan,
meta->>'geom_timespan' geom_timespan,
meta->>'normalization' normalization,
meta->>'api_method' api_method,
meta->'api_args' api_args
FROM UNNEST($1) AS meta
)
SELECT String_Agg(
SELECT
String_Agg(DISTINCT
CASE
-- pass-through geom if user is requesting it only
WHEN numer_id IS NULL AND api_method IS NULL THEN
geom_tablename || '.' || geom_colname || ' AS geom_' || geom_tablename
WHEN cdb_observatory.isnumeric(numer_type) AND api_method IS NULL THEN
-- for numeric points with area normalization, include areas of underlying geoms
CASE
WHEN $2 = 'ST_Point' AND (LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')) THEN
' Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography), 0)/1000000 ' ||
' AS area_' || geom_tablename
-- for numeric areas, include more complex calcs
WHEN $2 != 'ST_Point' THEN
'CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE ST_Area(cdb_observatory.safe_intersection(_geoms.geom, ' ||
geom_tablename || '.' || geom_colname || ')) / ' ||
'Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0) ' ||
'END pct_' || geom_tablename
ELSE NULL
END
ELSE NULL END
, ', ') AS geom_colspecs,
String_Agg(DISTINCT 'observatory.' || geom_tablename, ', ') AS geom_tables,
String_Agg(
'JSON_Build_Object(' || CASE
-- api-delivered values
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'cdb_observatory.FIRST( ' ||
api_method || '.' || numer_colname || ')::' || numer_type
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN LOWER(numer_type) LIKE 'numeric' THEN
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%' OR
(normalization IS NULL AND LOWER(denom_reltype) LIKE 'denominator')
THEN ' CASE ' ||
-- denominated point-in-poly or user polygon is same as OBS polygon
' WHEN ST_GeometryType(cdb_observatory.FIRST(_geoms.geom)) = ''ST_Point'' ' ||
' OR cdb_observatory.FIRST(_geoms.geom = ' || geom_tablename || '.' || geom_colname || ')' ||
' THEN cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))' ||
THEN CASE
-- denominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))'
-- denominated polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / SUM (denom * (% OBS geom in user geom))
' ELSE ' ||
ELSE
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(' || geom_tablename || '.' || geom_colname || '))' ||
' END) / '
' NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(' || geom_tablename || '.' || geom_colname || '))' ||
' END), 0) ' ||
' / (COUNT(*) / COUNT(distinct ' || geom_tablename || '.' || geom_geomref_colname || ')) ' ||
' END '
' * pct_' || geom_tablename ||
' ) / NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- areaNormalized
WHEN LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')
THEN ' CASE ' ||
-- areaNormalized point-in-poly or user polygon is the same as OBS polygon
' WHEN ST_GeometryType(cdb_observatory.FIRST(_geoms.geom)) = ''ST_Point'' ' ||
' OR cdb_observatory.FIRST(_geoms.geom = ' || geom_tablename || '.' || geom_colname || ')' ||
' THEN cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / (ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography)/1000000)) ' ||
THEN CASE
-- areaNormalized point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / area_' || geom_tablename || ')'
-- areaNormalized polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / area of big geom
' ELSE ' ||
ELSE
--' NULL END '
' SUM((' || numer_tablename || '.' || numer_colname || ') ' ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') THEN 1 ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) THEN ' ||
' ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' / ST_Area(_geoms.geom)' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(_geoms.geom))' ||
' END / (ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography) / 1000000)) ' ||
' / (COUNT(*) / COUNT(distinct ' || geom_tablename || '.' || geom_geomref_colname || ')) ' ||
' END '
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / (Nullif(ST_Area(cdb_observatory.FIRST(_procgeoms.geom)::Geography), 0) / 1000000) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- median/average measures with universe
WHEN LOWER(numer_aggregate) IN ('median', 'average') AND
denom_reltype ILIKE 'universe' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN ' CASE ' ||
-- predenominated point-in-poly or user polygon is the same as OBS- polygon
' WHEN ST_GeometryType(cdb_observatory.FIRST(_geoms.geom)) = ''ST_Point'' ' ||
' OR cdb_observatory.FIRST(_geoms.geom = ' || geom_tablename || '.' || geom_colname || ')' ||
' THEN cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') ' ||
' ELSE ' ||
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation weighted by universe
-- SUM (numer * denom * (% user geom in OBS geom)) / SUM (denom * (% user geom in OBS geom))
-- (10 * 1000 * 1) / (1000 * 1) = 10
-- (10 * 1000 * 1 + 50 * 10 * 1) / (1000 + 10) = 10500 / 10000 = 10.5
' SUM(' || numer_tablename || '.' || numer_colname ||
' * ' || denom_tablename || '.' || denom_colname ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(' || geom_tablename || '.' || geom_colname || '))' ||
' END) ' ||
' / SUM(' || denom_tablename || '.' || denom_colname ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(' || geom_tablename || '.' || geom_colname || '))' ||
' END) ' ||
' / (COUNT(*) / COUNT(distinct ' || geom_tablename || '.' || geom_geomref_colname || ')) ' ||
'END '
' * pct_' || geom_tablename ||
' ) / Nullif(SUM(' || denom_tablename || '.' || denom_colname ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- prenormalized for summable measures. point or summable only!
WHEN numer_aggregate ILIKE 'sum' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN ' CASE ' ||
-- predenominated point-in-poly or user polygon is the same as OBS- polygon
' WHEN ST_GeometryType(cdb_observatory.FIRST(_geoms.geom)) = ''ST_Point'' ' ||
' OR cdb_observatory.FIRST(_geoms.geom = ' || geom_tablename || '.' || geom_colname || ')' ||
' THEN cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') ' ||
' ELSE ' ||
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation
-- SUM (numer * (% user geom in OBS geom))
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / ST_Area(' || geom_tablename || '.' || geom_colname || ') ' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE (ST_Area(ST_Intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) ' ||
' / ST_Area(' || geom_tablename || '.' || geom_colname || '))' ||
' END) ' ||
' / (COUNT(*) / COUNT(distinct ' || geom_tablename || '.' || geom_geomref_colname || ')) ' ||
'END '
' * pct_' || geom_tablename ||
' ) / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- Everything else. Point only!
ELSE ' CASE ' ||
' WHEN ST_GeometryType(cdb_observatory.FIRST(_geoms.geom)) = ''ST_Point'' ' ||
' OR cdb_observatory.FIRST(_geoms.geom = ' || geom_tablename || '.' || geom_colname || ')' ||
' THEN cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') ' ||
' ELSE cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric ' ||
' END '
END || ':: ' || numer_type
ELSE CASE
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
' cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric '
END
END || '::' || numer_type
-- categorical/text
WHEN LOWER(numer_type) LIKE 'text' THEN
@@ -700,13 +705,13 @@ BEGIN
-- geometry
WHEN numer_id IS NULL THEN
'''geomref'', ' || geom_tablename || '.' || geom_geomref_colname || ', ' ||
'''value'', ' || 'cdb_observatory.FIRST(' || geom_tablename ||
'.' || geom_colname || ')::TEXT'
'''geomref'', geomref_' || geom_tablename || ', ' ||
'''value'', ' || 'cdb_observatory.FIRST(geom_' || geom_tablename ||
')::TEXT'
-- code below will return the intersection of the user's geom and the
-- OBS geom
--'"value": "'' || ' || 'cdb_observatory.FIRST(ST_Intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT || ''"'''
--'''value'', ' || 'ST_Union(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT'
ELSE ''
END || ')', ', ')
AS colspecs,
@@ -716,8 +721,11 @@ BEGIN
--
-- api_method and geom_tablename are interchangeable since when an
-- api_method is passed, geom_tablename is ignored
STRING_AGG(COALESCE(geom_tablename, api_method) ||
'.' || geom_geomref_colname, ', ') AS geomrefs,
String_Agg(DISTINCT COALESCE(geom_tablename, api_method) || '.' || geom_geomref_colname ||
' AS geomref_' || COALESCE(geom_tablename, api_method), ', ') AS geomrefs_alias,
String_Agg(DISTINCT 'geomref_' || COALESCE(geom_tablename, api_method)
, ', ') AS geomrefs_noalias,
(SELECT String_Agg(DISTINCT CASE
-- External API
@@ -730,51 +738,69 @@ BEGIN
SELECT DISTINCT UNNEST(tablenames_ary) tablename FROM (
SELECT ARRAY_AGG(numer_tablename) ||
ARRAY_AGG(denom_tablename) ||
ARRAY_AGG(geom_tablename) ||
ARRAY_AGG('cdb_observatory.' || api_method || '(_geoms.geom' || COALESCE(', ' ||
ARRAY_AGG('cdb_observatory.' || api_method || '(_procgeoms.geom' || COALESCE(', ' ||
(SELECT STRING_AGG(REPLACE(val::text, '"', ''''), ', ')
FROM (SELECT json_array_elements(api_args) as val) as vals),
'') || ')')
tablenames_ary
) tablenames_inner
) tablenames_outer) tablenames,
) tablenames_outer) data_tables,
String_Agg(DISTINCT numer_tablename || '.' || numer_geomref_colname || ' = ' ||
geom_tablename || '.' || geom_geomref_colname ||
Coalesce(' AND ' || numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname, ''),
' AND ') AS obs_wheres,
String_Agg(DISTINCT array_to_string(ARRAY[
CASE WHEN numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
'_procgeoms.geomref_' || geom_tablename
ELSE NULL END,
CASE WHEN numer_tablename != denom_tablename
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname
ELSE NULL END
], ' AND '),
' AND ') FILTER (WHERE numer_tablename != denom_tablename OR
(numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL)) AS obs_wheres,
String_Agg('ST_Intersects(' || geom_tablename || '.' || geom_colname
String_Agg(DISTINCT 'ST_Intersects(' || geom_tablename || '.' || geom_colname
|| ', _geoms.geom)', ' AND ')
AS user_wheres
FROM _meta
;
$query$
INTO colspecs, geomrefs, tables, obs_wheres, user_wheres
USING (SELECT ARRAY(SELECT json_array_elements_text(params))::json[]);
INTO geom_colspecs, geom_tables, data_colspecs, geomrefs_alias,
geomrefs_noalias, data_tables, obs_wheres, user_wheres
USING (SELECT ARRAY(SELECT json_array_elements_text(params))::json[]), geomtype;
RETURN QUERY EXECUTE format($query$
WITH _raw_geoms AS (SELECT
(UNNEST($1)).val as id,
(UNNEST($1)).geom AS geom),
WITH _raw_geoms AS (%s),
_geoms AS (SELECT id,
CASE WHEN (ST_NPoints(geom) > 500)
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.0001)), 3)
ELSE geom END geom
FROM _raw_geoms)
SELECT _geoms.id::INT, Array_to_JSON(ARRAY[%s]::JSON[])
FROM _geoms, %s
FROM _raw_geoms),
_procgeoms AS (SELECT _geoms.id, _geoms.geom %s %s
FROM _geoms %s
%s
)
SELECT _procgeoms.id::INT, Array_to_JSON(ARRAY[%s]::JSON[])
FROM _procgeoms %s
%s
GROUP BY _geoms.id %s
ORDER BY _geoms.id
$query$, colspecs, tables,
'WHERE ' || NULLIF(ARRAY_TO_STRING(ARRAY[obs_wheres, user_wheres], ' AND '), ''),
CASE WHEN merge IS False THEN ', ' || geomrefs ELSE '' END)
GROUP BY _procgeoms.id %s
ORDER BY _procgeoms.id
$query$, CASE WHEN ARRAY_LENGTH(geomvals, 1) = 1 THEN
' SELECT $1[1].val as id, $1[1].geom as geom '
ELSE
' SELECT val as id, geom FROM UNNEST($1) '
END,
', ' || NullIf(geomrefs_alias, ''),
', ' || NullIf(geom_colspecs, ''),
', ' || NullIf(geom_tables, ''),
'WHERE ' || NullIf( user_wheres, ''),
data_colspecs, ', ' || NullIf(data_tables, ''),
'WHERE ' || NULLIF(obs_wheres, ''),
CASE WHEN merge IS False THEN ', ' || geomrefs_noalias ELSE '' END)
USING geomvals;
RETURN;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetCategory(
@@ -832,7 +858,7 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetUSCensusMeasure(
@@ -866,7 +892,7 @@ BEGIN
USING geom, measure_id, normalize, boundary_id, time_span;
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetUSCensusCategory(
@@ -902,7 +928,7 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetPopulation(
geom geometry(Geometry, 4326),
@@ -928,7 +954,7 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetSegmentSnapshot(
@@ -1017,4 +1043,4 @@ BEGIN
RETURN result;
END;
$$ LANGUAGE plpgsql;
$$ LANGUAGE plpgsql STABLE;

View File

@@ -21,3 +21,7 @@ t
obs_dumpversion_notnull
t
(1 row)
ERROR: Error performing intersection: TopologyException: found non-noded intersection between LINESTRING (-97.1968 25.9574, -97.1971 25.9576) and LINESTRING (-97.197 25.9575, -97.1972 25.9576) at -97.19699802694231 25.957551976080605
complex_safe_intersection_works
t
(1 row)

View File

@@ -249,15 +249,15 @@ t|t
obs_getdata_api_geomvals_no_args
t
(1 row)
obs_getdata_api_geomvals_args_numer_return
t
ary_type|obs_getdata_api_geomvals_args_numer_return
t|t
(1 row)
obs_getdata_api_geomvals_args_string_return
t
ary_type|obs_getdata_api_geomvals_args_string_return
t|t
(1 row)
obs_getdata_api_geomrefs_args_numer_return
t
ary_type|obs_getdata_api_geomrefs_args_numer_return
t|t
(1 row)
obs_getdata_api_geomrefs_args_string_return
t
ary_type|obs_getdata_api_geomrefs_args_string_return
t|t
(1 row)

View File

@@ -26,6 +26,7 @@ DROP TABLE IF EXISTS observatory.obs_6c1309a64d8f3e6986061f4d1ca7b57743e75e74;
DROP TABLE IF EXISTS observatory.obs_0310c639744a2014bb1af82709228f05b59e7d3d;
DROP TABLE IF EXISTS observatory.obs_87a814e485deabe3b12545a537f693d16ca702c2;
DROP TABLE IF EXISTS observatory.obs_e32f8e59c7c8861ee5ee4029b3ace2af9a5c9caf;
DROP TABLE IF EXISTS observatory.obs_23cb5063486bd7cf36f17e89e5e65cd31b331f6e;
DROP TABLE IF EXISTS observatory.obs_1ea93bbc109c87c676b3270789dacf7a1430db6c;
DROP TABLE IF EXISTS observatory.obs_b393b5b88c6adda634b2071a8005b03c551b609a;
DROP TABLE IF EXISTS observatory.obs_8e30e6b3792430b410ba5b9e49cdc6a0d404d48f;

File diff suppressed because one or more lines are too long

View File

@@ -47,3 +47,15 @@ SELECT cdb_observatory._OBS_StandardizeMeasureName('test 343 %% 2 qqq }}{{}}') =
SELECT cdb_observatory.OBS_DumpVersion()
IS NOT NULL AS OBS_DumpVersion_notnull;
-- Should fail to perform intersection
SELECT ST_IsValid(ST_Intersection(
cdb_observatory.OBS_GetBoundaryByID('48061', 'us.census.tiger.county'),
cdb_observatory.OBS_GetBoundaryByID('48061', 'us.census.tiger.county_clipped')
)) AS complex_intersection_fails;
-- Should succeed in intersecting
SELECT ST_IsValid(cdb_observatory.safe_intersection(
cdb_observatory.OBS_GetBoundaryByID('48061', 'us.census.tiger.county'),
cdb_observatory.OBS_GetBoundaryByID('48061', 'us.census.tiger.county_clipped')
)) AS complex_safe_intersection_works;

View File

@@ -765,27 +765,36 @@ SELECT id = '36047048500' AS id,
FROM data;
-- OBS_GetData with an API + geomvals, no args
SELECT ARRAY['us.census.tiger.census_tract'] <@ array_agg(data->0->>'value') AS OBS_GetData_API_geomvals_no_args
SELECT (SELECT array_agg(json_array_elements::text) @> array['"us.census.tiger.census_tract"']
FROM json_array_elements(data->0->'value'))
AS OBS_GetData_API_geomvals_no_args
FROM cdb_observatory.obs_getdata(array[(cdb_observatory._testarea(), 1)::geomval],
'[{"numer_type": "text", "numer_colname": "boundary_id", "api_method": "obs_getavailableboundaries", "geom_geomref_colname": "boundary_id"}]',
false);
'[{"numer_type": "text", "numer_colname": "boundary_id", "api_method": "obs_getavailableboundaries"}]');
-- OBS_GetData with an API + geomvals, args, numeric
SELECT json_typeof(data->0->'value') = 'number' AS OBS_GetData_API_geomvals_args_numer_return
SELECT json_typeof(data->0->'value') = 'array' ary_type,
json_typeof(data->0->'value'->0) = 'number'
AS OBS_GetData_API_geomvals_args_numer_return
FROM cdb_observatory.obs_getdata(array[(cdb_observatory._testarea(), 1)::geomval],
'[{"numer_type": "numeric", "numer_colname": "obs_getmeasure", "api_method": "obs_getmeasure", "api_args": ["us.census.acs.B01003001"]}]', false);
'[{"numer_type": "numeric", "numer_colname": "obs_getmeasure", "api_method": "obs_getmeasure", "api_args": ["us.census.acs.B01003001"]}]');
-- OBS_GetData with an API + geomvals, args, text
SELECT json_typeof(data->0->'value') = 'string' AS OBS_GetData_API_geomvals_args_string_return
SELECT json_typeof(data->0->'value') = 'array' ary_type,
json_typeof(data->0->'value'->0) = 'string'
AS OBS_GetData_API_geomvals_args_string_return
FROM cdb_observatory.obs_getdata(array[(cdb_observatory._testarea(), 1)::geomval],
'[{"numer_type": "text", "numer_colname": "obs_getcategory", "api_method": "obs_getcategory", "api_args": ["us.census.spielman_singleton_segments.X55"]}]', false);
'[{"numer_type": "text", "numer_colname": "obs_getcategory", "api_method": "obs_getcategory", "api_args": ["us.census.spielman_singleton_segments.X55"]}]');
-- OBS_GetData with an API + geomrefs, args, numeric
SELECT json_typeof(data->0->'value') = 'number' AS OBS_GetData_API_geomrefs_args_numer_return
SELECT json_typeof(data->0->'value') = 'array' ary_type,
json_typeof(data->0->'value'->0) = 'number'
AS OBS_GetData_API_geomrefs_args_numer_return
FROM cdb_observatory.obs_getdata(array['36047076200'],
'[{"numer_type": "numeric", "numer_colname": "obs_getmeasurebyid", "api_method": "obs_getmeasurebyid", "api_args": ["us.census.acs.B01003001", "us.census.tiger.census_tract"]}]');
-- OBS_GetData with an API + geomrefs, args, text
SELECT json_typeof(data->0->'value') = 'string' AS OBS_GetData_API_geomrefs_args_string_return
SELECT json_typeof(data->0->'value') = 'array' ary_type,
json_typeof(data->0->'value'->0) = 'string'
AS OBS_GetData_API_geomrefs_args_string_return
FROM cdb_observatory.obs_getdata(array['36047'],
'[{"numer_type": "text", "numer_colname": "obs_getboundarybyid", "api_method": "obs_getboundarybyid", "api_args": ["us.census.tiger.county"]}]');

View File

@@ -1,3 +1,4 @@
nose
nose-timer
nose_parameterized
psycopg2

View File

@@ -2,39 +2,21 @@ from nose.tools import assert_equal, assert_is_not_none
from nose.plugins.skip import SkipTest
from nose_parameterized import parameterized
from itertools import izip_longest
from util import query
from collections import OrderedDict
import json
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
USE_SCHEMA = True
MEASURE_COLUMNS = query('''
SELECT distinct numer_id, Coalesce(numer_aggregate, '') NOT ILIKE 'sum' as point_only
FROM observatory.obs_meta
WHERE numer_type ILIKE 'numeric'
AND numer_weight > 0
''').fetchall()
CATEGORY_COLUMNS = query('''
SELECT distinct numer_id
FROM observatory.obs_meta
WHERE numer_type ILIKE 'text'
AND numer_weight > 0
''').fetchall()
BOUNDARY_COLUMNS = query('''
SELECT id FROM observatory.obs_column
WHERE type ILIKE 'geometry'
AND weight > 0
''').fetchall()
US_CENSUS_MEASURE_COLUMNS = query('''
SELECT distinct numer_name
FROM observatory.obs_meta
WHERE numer_type ILIKE 'numeric'
AND 'us.census.acs.acs' = ANY (subsection_tags)
AND numer_weight > 0
''').fetchall()
SKIP_COLUMNS = set([
u'mx.inegi_columns.INDI18',
u'mx.inegi_columns.ECO40',
@@ -73,8 +55,61 @@ SKIP_COLUMNS = set([
u'us.census.tiger.mtfcc',
u'whosonfirst.wof_county_name',
u'whosonfirst.wof_region_name',
'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'
, 'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'
, 'uk.ons.LC3202WA0007'
, 'uk.ons.LC3202WA0010'
, 'uk.ons.LC3202WA0004'
, 'uk.ons.LC3204WA0004'
, 'uk.ons.LC3204WA0007'
, 'uk.ons.LC3204WA0010'
])
MEASURE_COLUMNS = query('''
SELECT ARRAY_AGG(DISTINCT numer_id) numer_ids,
numer_aggregate,
denom_reltype,
section_tags
FROM observatory.obs_meta
WHERE numer_weight > 0
AND numer_id NOT IN ('{skip}')
AND section_tags IS NOT NULL
AND subsection_tags IS NOT NULL
GROUP BY numer_aggregate, section_tags, denom_reltype
'''.format(skip="', '".join(SKIP_COLUMNS))).fetchall()
#CATEGORY_COLUMNS = query('''
#SELECT distinct numer_id
#FROM observatory.obs_meta
#WHERE numer_type ILIKE 'text'
#AND numer_weight > 0
#''').fetchall()
#
#BOUNDARY_COLUMNS = query('''
#SELECT id FROM observatory.obs_column
#WHERE type ILIKE 'geometry'
#AND weight > 0
#''').fetchall()
#
#US_CENSUS_MEASURE_COLUMNS = query('''
#SELECT distinct numer_name
#FROM observatory.obs_meta
#WHERE numer_type ILIKE 'numeric'
#AND 'us.census.acs' = ANY (subsection_tags)
#AND numer_weight > 0
#''').fetchall()
#def default_geometry_id(column_id):
# '''
# Returns default test point for the column_id.
@@ -125,37 +160,37 @@ def default_lonlat(column_id):
elif column_id.startswith('th.'):
return (13.725377712079784, 100.49263000488281)
# cols for French Guyana only
elif column_id in ('fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'
, 'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
, 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
, 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
, 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
, 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
, 'fr.insee.P12_ACTOCC15P_ILT45D'):
return (4.938408371206558, -52.32908248901367)
#elif column_id in ('fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
# , 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
# , 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
# , 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
# , 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
# , 'fr.insee.P12_ACTOCC15P_ILT45D'
# , 'fr.insee.P12_RP_CHOS', 'fr.insee.P12_RP_HABFOR'
# , 'fr.insee.P12_RP_EAUCH', 'fr.insee.P12_RP_BDWC'
# , 'fr.insee.P12_RP_MIDUR', 'fr.insee.P12_RP_CLIM'
# , 'fr.insee.P12_RP_MIBOIS', 'fr.insee.P12_RP_CASE'
# , 'fr.insee.P12_RP_TTEGOU', 'fr.insee.P12_RP_ELEC'
# , 'fr.insee.P12_ACTOCC15P_ILT45D'):
# return (4.938408371206558, -52.32908248901367)
elif column_id.startswith('fr.'):
return (48.860875144709475, 2.3613739013671875)
elif column_id.startswith('ca.'):
return (43.65594991256823, -79.37965393066406)
elif column_id.startswith('us.census.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.dma.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.ihme.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.bls.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.qcew.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('whosonfirst.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('us.epa.'):
return (40.7, -73.9)
return (28.3305906291771, -81.3544048197256)
elif column_id.startswith('eu.'):
raise SkipTest('No tests for Eurostat!')
elif column_id.startswith('br.'):
@@ -181,46 +216,74 @@ def default_area(column_id):
point=point)
return area
@parameterized(US_CENSUS_MEASURE_COLUMNS)
def test_get_us_census_measure_points(name):
resp = query('''
SELECT * FROM {schema}OBS_GetUSCensusMeasure({point}, '{name}')
'''.format(name=name.replace("'", "''"),
schema='cdb_observatory.' if USE_SCHEMA else '',
point=default_point('')))
rows = resp.fetchall()
assert_equal(1, len(rows))
assert_is_not_none(rows[0][0])
#@parameterized(US_CENSUS_MEASURE_COLUMNS)
#def test_get_us_census_measure_points(name):
# resp = query('''
#SELECT * FROM {schema}OBS_GetUSCensusMeasure({point}, '{name}')
# '''.format(name=name.replace("'", "''"),
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point('')))
# rows = resp.fetchall()
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
@parameterized(MEASURE_COLUMNS)
def test_get_measure_areas(column_id, point_only):
if column_id in SKIP_COLUMNS:
raise SkipTest('Column {} should be skipped'.format(column_id))
if point_only:
def grouped_measure_columns():
for numer_ids, numer_aggregate, denom_reltype, section_tags in MEASURE_COLUMNS:
for colgroup in grouper(numer_ids, 50):
yield [c for c in colgroup if c], numer_aggregate, denom_reltype, section_tags
@parameterized(grouped_measure_columns())
def test_get_measure_points(numer_ids, numer_aggregate, denom_reltype, section_tags):
_test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, default_point(numer_ids[0]))
@parameterized(grouped_measure_columns())
def test_get_measure_areas(numer_ids, numer_aggregate, denom_reltype, section_tags):
if numer_aggregate is None or numer_aggregate.lower() not in ('sum', 'median', 'average'):
return
resp = query('''
SELECT * FROM {schema}OBS_GetMeasure({area}, '{column_id}')
'''.format(column_id=column_id,
schema='cdb_observatory.' if USE_SCHEMA else '',
area=default_area(column_id)))
rows = resp.fetchall()
assert_equal(1, len(rows))
assert_is_not_none(rows[0][0])
if numer_aggregate.lower() in ('median', 'average') \
and (denom_reltype is None \
or denom_reltype.lower() != 'universe'):
return
_test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, default_area(numer_ids[0]))
@parameterized(MEASURE_COLUMNS)
def test_get_measure_points(column_id, point_only):
if column_id in SKIP_COLUMNS:
raise SkipTest('Column {} should be skipped'.format(column_id))
resp = query('''
SELECT * FROM {schema}OBS_GetMeasure({point}, '{column_id}')
'''.format(column_id=column_id,
schema='cdb_observatory.' if USE_SCHEMA else '',
point=default_point(column_id)))
rows = resp.fetchall()
assert_equal(1, len(rows))
assert_is_not_none(rows[0][0])
def _test_measures(numer_ids, numer_aggregate, section_tags, denom_reltype, geom):
in_params = []
for numer_id in numer_ids:
in_params.append({
'numer_id': numer_id,
'normalization': 'predenominated'
})
params = query(u'''
SELECT {schema}OBS_GetMeta({geom}, '{in_params}')
'''.format(schema='cdb_observatory.' if USE_SCHEMA else '',
geom=geom,
in_params=json.dumps(in_params))).fetchone()[0]
# We can get duplicate IDs from multi-denominators, so for now we
# compress those measures into a single
params = OrderedDict([(p['id'], p) for p in params]).values()
assert_equal(len(params), len(in_params),
'Inconsistent out and in params for {}'.format(in_params))
q = u'''
SELECT * FROM {schema}OBS_GetData(ARRAY[({geom}, 1)::geomval], '{params}')
'''.format(schema='cdb_observatory.' if USE_SCHEMA else '',
geom=geom,
params=json.dumps(params).replace(u"'", "''"))
resp = query(q).fetchone()
assert_is_not_none(resp, 'NULL returned for {}'.format(in_params))
rawvals = resp[1]
vals = [v['value'] for v in rawvals]
assert_equal(len(vals), len(in_params))
for i, val in enumerate(vals):
assert_is_not_none(val, 'NULL for {}'.format(in_params[i]['numer_id']))
#@parameterized(CATEGORY_COLUMNS)
#def test_get_category_areas(column_id):
@@ -234,18 +297,18 @@ SELECT * FROM {schema}OBS_GetMeasure({point}, '{column_id}')
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
@parameterized(CATEGORY_COLUMNS)
def test_get_category_points(column_id):
if column_id in SKIP_COLUMNS:
raise SkipTest('Column {} should be skipped'.format(column_id))
resp = query('''
SELECT * FROM {schema}OBS_GetCategory({point}, '{column_id}')
'''.format(column_id=column_id,
schema='cdb_observatory.' if USE_SCHEMA else '',
point=default_point(column_id)))
rows = resp.fetchall()
assert_equal(1, len(rows))
assert_is_not_none(rows[0][0])
#@parameterized(CATEGORY_COLUMNS)
#def test_get_category_points(column_id):
# if column_id in SKIP_COLUMNS:
# raise SkipTest('Column {} should be skipped'.format(column_id))
# resp = query('''
#SELECT * FROM {schema}OBS_GetCategory({point}, '{column_id}')
# '''.format(column_id=column_id,
# schema='cdb_observatory.' if USE_SCHEMA else '',
# point=default_point(column_id)))
# rows = resp.fetchall()
# assert_equal(1, len(rows))
# assert_is_not_none(rows[0][0])
#@parameterized(BOUNDARY_COLUMNS)
#def test_get_boundaries_by_geometry(column_id):

View File

@@ -74,7 +74,10 @@ for q in (
q_formatted = q.format(
schema='cdb_observatory.' if USE_SCHEMA else '',
)
start = time()
resp = query(q_formatted)
end = time()
print('{} for {}'.format(int(end - start), q_formatted))
if q.lower().startswith('insert'):
if resp.rowcount == 0:
raise Exception('''Performance fixture creation "{}" inserted 0 rows,
@@ -189,29 +192,21 @@ def test_getgeometryscores_performance(geom_complexity, api_method, filters, tar
('simple', 'OBS_GetCategory', None, 'geom', "'us.census.tiger.census_tract'"),
('simple', 'OBS_GetCategory', None, 'offset_geom', "'us.census.tiger.census_tract'"),
('complex', 'OBS_GetMeasure', 'predenominated', 'point', 'NULL'),
('complex', 'OBS_GetMeasure', 'predenominated', 'geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'predenominated', 'offset_geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'area', 'point', 'NULL'),
('complex', 'OBS_GetMeasure', 'area', 'geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'area', 'offset_geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'denominator', 'point', 'NULL'),
('complex', 'OBS_GetMeasure', 'denominator', 'geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'denominator', 'offset_geom', 'NULL'),
('complex', 'OBS_GetCategory', None, 'point', 'NULL'),
('complex', 'OBS_GetCategory', None, 'geom', 'NULL'),
('complex', 'OBS_GetCategory', None, 'offset_geom', 'NULL'),
('complex', 'OBS_GetMeasure', 'predenominated', 'point', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'predenominated', 'geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'predenominated', 'offset_geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'area', 'point', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'area', 'geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'area', 'offset_geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'denominator', 'point', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'denominator', 'geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetMeasure', 'denominator', 'offset_geom', "'us.census.tiger.county'"),
('complex', 'OBS_GetCategory', None, 'point', "'us.census.tiger.census_tract'"),
('complex', 'OBS_GetCategory', None, 'geom', "'us.census.tiger.census_tract'"),
('complex', 'OBS_GetCategory', None, 'offset_geom', "'us.census.tiger.census_tract'"),
])
@@ -273,78 +268,85 @@ def test_getmeasure_performance(geom_complexity, api_method, normalization, geom
('simple', 'denominator', 'geom', "'us.census.tiger.census_tract'"),
('simple', 'denominator', 'offset_geom', "'us.census.tiger.census_tract'"),
('complex', 'predenominated', 'point', 'null'),
('complex', 'predenominated', 'geom', 'null'),
('complex', 'predenominated', 'offset_geom', 'null'),
('complex', 'area', 'point', 'null'),
('complex', 'area', 'geom', 'null'),
('complex', 'area', 'offset_geom', 'null'),
('complex', 'denominator', 'point', 'null'),
('complex', 'denominator', 'geom', 'null'),
('complex', 'denominator', 'offset_geom', 'null'),
('complex', 'predenominated', 'point', "'us.census.tiger.county'"),
('complex', 'predenominated', 'geom', "'us.census.tiger.county'"),
('complex', 'predenominated', 'offset_geom', "'us.census.tiger.county'"),
('complex', 'area', 'point', "'us.census.tiger.county'"),
('complex', 'area', 'geom', "'us.census.tiger.county'"),
('complex', 'area', 'offset_geom', "'us.census.tiger.county'"),
('complex', 'denominator', 'point', "'us.census.tiger.county'"),
('complex', 'denominator', 'geom', "'us.census.tiger.county'"),
('complex', 'denominator', 'offset_geom', "'us.census.tiger.county'"),
])
def test_getmeasure_split_performance(geom_complexity, normalization, geom, boundary):
def test_getdata_performance(geom_complexity, normalization, geom, boundary):
print geom_complexity, normalization, geom, boundary
results = []
cols = ['us.census.acs.B01001002',
'us.census.acs.B01001003',
'us.census.acs.B01001004',
'us.census.acs.B01001005',
'us.census.acs.B01001006',
'us.census.acs.B01001007',
'us.census.acs.B01001008',
'us.census.acs.B01001009',
'us.census.acs.B01001010',
'us.census.acs.B01001011', ]
in_meta = [{"numer_id": col,
"normalization": normalization,
"geom_id": None if boundary.lower() == 'null' else boundary.replace("'", '')}
for col in cols]
rownums = (1, 5, 10, ) if geom_complexity == 'complex' else (10, 50, 100)
for rows in rownums:
stmt = '''
with data as (
SELECT id, data FROM {schema}OBS_GetData(
(SELECT array_agg(({geom}, cartodb_id)::geomval)
FROM obs_perftest_{complexity}
WHERE cartodb_id <= {n}),
(SELECT {schema}OBS_GetMeta(
(SELECT st_setsrid(st_extent({geom}), 4326)
FROM obs_perftest_{complexity}
WHERE cartodb_id <= {n}),
'[{{
"numer_id": "us.census.acs.B01001002",
"normalization": "{normalization}",
"geom_id": {boundary}
}}]'::JSON
))
))
UPDATE obs_perftest_{complexity}
SET measure = (data->0->>'value')::Numeric
FROM data
WHERE obs_perftest_{complexity}.cartodb_id = data.id
;
'''.format(
point_or_poly='point' if geom == 'point' else 'polygon',
complexity=geom_complexity,
schema='cdb_observatory.' if USE_SCHEMA else '',
normalization=normalization,
geom=geom,
boundary=boundary.replace("'", '"'),
n=rows)
start = time()
query(stmt)
end = time()
qps = (rows / (end - start))
results.append({
'rows': rows,
'qps': qps,
'stmt': stmt
})
print rows, ': ', qps, ' QPS'
if 'OBS_RECORD_TEST' in os.environ:
record({
'geom_complexity': geom_complexity,
'api_method': 'OBS_GetData',
'normalization': normalization,
'boundary': boundary,
'geom': geom
}, results)
for num_meta in (1, 10, ):
results = []
for rows in rownums:
stmt = '''
with data as (
SELECT id, data FROM {schema}OBS_GetData(
(SELECT array_agg(({geom}, cartodb_id)::geomval)
FROM obs_perftest_{complexity}
WHERE cartodb_id <= {n}),
(SELECT {schema}OBS_GetMeta(
(SELECT st_setsrid(st_extent({geom}), 4326)
FROM obs_perftest_{complexity}
WHERE cartodb_id <= {n}),
'{in_meta}'::JSON
))
))
UPDATE obs_perftest_{complexity}
SET measure = (data->0->>'value')::Numeric
FROM data
WHERE obs_perftest_{complexity}.cartodb_id = data.id
;
'''.format(
point_or_poly='point' if geom == 'point' else 'polygon',
complexity=geom_complexity,
schema='cdb_observatory.' if USE_SCHEMA else '',
geom=geom,
in_meta=json.dumps(in_meta[0:num_meta]),
n=rows)
start = time()
query(stmt)
end = time()
qps = (rows / (end - start))
results.append({
'rows': rows,
'qps': qps,
'stmt': stmt
})
print rows, ': ', qps, ' QPS'
if 'OBS_RECORD_TEST' in os.environ:
record({
'geom_complexity': geom_complexity,
'api_method': 'OBS_GetData',
'normalization': normalization,
'boundary': boundary,
'geom': geom,
'num_meta': str(num_meta)
}, results)