20 Commits
1.3.5 ... 1.5.1

Author SHA1 Message Date
John Krauss
415a4ccc05 update NEWS for 1.5.1 2017-05-16 14:33:02 +00:00
John Krauss
ccb8092506 1.5.1 release artifact 2017-05-16 14:27:49 +00:00
John Krauss
6266262427 new code to handle mixed geometries more quickly 2017-05-10 20:24:21 +00:00
John Krauss
183c046289 release artifact 2017-04-26 20:08:44 +00:00
John Krauss
8df89f4a91 remove br subdistritos from testing 2017-04-25 18:57:12 +00:00
John Krauss
28694163a2 prefer geographpic precision over most recent timespan, handles issues emerging from inclusion of 1-year acs 2017-04-25 18:53:12 +00:00
John Krauss
60c7f54315 update NEWS for 1.5.0, fix error in link in 1.4.0 2017-04-24 18:22:31 +00:00
John Krauss
3ebb0b8662 Merge branch 'release-v-1.5.0' into obs-getavailablegeometries-return-tags 2017-04-24 18:10:43 +00:00
John Krauss
a2e84696dc fix tests to match fixture data 2017-04-24 18:01:38 +00:00
John Krauss
cd5cb38e8d Merge branch 'release-v-1.5.0' into obs-getavailablegeometries-return-tags 2017-04-24 17:50:57 +00:00
John Krauss
26e1a2f461 Add tags to obs_getavailablegeometries
Fixes #260

* Adds `geom_type`, `geom_extra`, and `geom_tags` to `OBS_GetAvailableGeometries`. This brings it up to spec with existing docs.
* Adds `timespan_type`, `timespan_extra`, and `timespan_tags` to `OBS_GetAvailableTimespans` for consistency.
2017-04-03 21:51:32 +00:00
John Krauss
090a1add43 add suggested_name output to OBS_GetMeta. fixes #279 2017-04-03 19:44:00 +00:00
John Krauss
536af5e4a2 release artifact 2017-03-22 15:17:19 +00:00
John Krauss
ebf23d2a23 Merge branch 'develop' into release-v-1.4.0 2017-03-22 15:16:35 +00:00
John Krauss
f1afcf0d8e update NEWS.md 2017-03-22 15:14:35 +00:00
John Krauss
3c0b40cf3f more consistent arguments in docs 2017-03-22 15:12:50 +00:00
John Krauss
8a87dc7e9a update NEWS.md 2017-03-21 21:24:50 +00:00
John Krauss
61552adba4 Allow for target_geoms and target_area override on column-by-column basis 2017-03-21 17:26:02 +00:00
csobier
36abbee64f Merge pull request #274 from CartoDB/273-docs-edit
clarification of docs for obs_getboundariesbygeometry function
2017-03-17 12:07:48 -04:00
csobier
5a76a7381e clarification of docs for obs_getboundariesbygeometry function 2017-03-17 11:45:49 -04:00
17 changed files with 8533 additions and 1223 deletions

42
NEWS.md
View File

@@ -1,3 +1,45 @@
1.5.1 (2017-05-16)
__Improvements__
* Much improved performance for `OBS_GetData` when augmenting with several
different geometries simultaneously ([#285](https://github.com/CartoDB/observatory-extension/pull/285))
* Return the automatically assigned normalization type from `OBS_GetMeta`
([#285](https://github.com/CartoDB/observatory-extension/pull/285))
1.5.0 (2017-04-24)
__API Changes__
* Add `suggested_name` to `OBS_GetMeta` responses
([#281](https://github.com/CartoDB/observatory-extension/pull/281))
* Add `geom_type`, `geom_extra`, and `geom_tags` to
`OBS_GetAvailableGeometries`. This brings it up to spec with existing docs.
([#282](https://github.com/CartoDB/observatory-extension/pull/282))
* Add `timespan_type`, `timespan_extra`, and `timespan_tags` to
`OBS_GetAvailableTimespans` for consistency.
([#282](https://github.com/CartoDB/observatory-extension/pull/282))
1.4.0 (2017-03-21)
__API Changes__
* Allow for override of `target_area` and `target_geoms` in `OBS_GetMeta`
([#276](https://github.com/CartoDB/observatory-extension/pull/276)). This
allows the interface to work with points and sparse areas much btter.
* Allow for override of `max_timespan_rank` and `max_score_rank` on an
item-by-item basis for metadata.
* `numer_description`, `geom_description`, `denom_description`,
`numer_t_description`, `denom_t_description` and `geom_t_description` now
returned as part of `OBS_GetMeta`.
__Improvements__
* Reduced amount of simplification done on input geometries (from 0.0001 above
500 points to 0.00001 above 1000 points).
* Added tests to confirm that accurate results are returned from automatic
boundary selection
1.3.5 (2017-03-15)
No changes. Artifact to allow for data update.

View File

@@ -4,7 +4,7 @@ Use the following functions to retrieve [Boundary](https://carto.com/docs/carto-
You can [access](https://carto.com/docs/carto-engine/data/accessing) boundaries through CARTO Builder. The same methods will work if you are using the CARTO Engine to develop your application. We [encourage you](http://docs/carto-engine/data/accessing/#best-practices) to use table modifying methods (UPDATE and INSERT) over dynamic methods (SELECT).
## OBS_GetBoundariesByGeometry(polygon geometry, geometry_id text)
## OBS_GetBoundariesByGeometry(geom geometry, geometry_id text)
The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a set of boundary geometries that intersect a supplied geometry. This can be used to find all boundaries that are within or overlap a bounding box. You have the ability to choose whether to retrieve all boundaries that intersect your supplied bounding box or only those that fall entirely inside of your bounding box.
@@ -12,7 +12,7 @@ The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a se
Name |Description
--- | ---
polygon | a bounding box or other WGS84 geometry
geom | a WGS84 geometry
geometry_id | a string identifier for a boundary geometry
timespan (optional) | year(s) to request from ('NULL' (default) gives most recent)
overlap_type (optional) | one of '[intersects](http://postgis.net/docs/manual-2.2/ST_Intersects.html)' (default), '[contains](http://postgis.net/docs/manual-2.2/ST_Contains.html)', or '[within](http://postgis.net/docs/manual-2.2/ST_Within.html)'.
@@ -26,7 +26,7 @@ Column Name | Description
the_geom | a boundary geometry (e.g., US Census tract boundaries)
geom_refs | a string identifier for the geometry (e.g., geoids of US Census tracts)
If geometries are not found for the requested `polygon`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.
If geometries are not found for the requested `geom`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.
#### Example
@@ -44,7 +44,6 @@ FROM OBS_GetBoundariesByGeometry(
#### Errors
* If a geometry other than a point is passed as the first argument, an error is thrown: `Invalid geometry type (ST_Polygon), expecting 'ST_Point'`
* If an `overlap_type` other than the valid ones listed above is entered, then an error is thrown
## OBS_GetPointsByGeometry(polygon geometry, geometry_id text)

View File

@@ -327,9 +327,12 @@ timespan_id | Text | The ID of the timespan
timespan_name | Text | A human readable name for the timespan
timespan_description | Text | Ignored
timespan_weight | Numeric | Ignored
timespan_aggregate | Text | Ignored
timespan_license | Text | Ignored
timespan_source | Text | Ignored
timespan_aggregate | Text | Ignored
timespan_type | Text | Ignored
timespan_extra | JSONB | Ignored
timespan_tags | JSONB | Ignored
valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this timespan, False otherwise
valid_denom | Boolean | True if the `timespan` argument is a valid timespan for this timespan, False otherwise
valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this timespan, False otherwise

View File

@@ -196,7 +196,7 @@ UPDATE tablename
SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55')
```
## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_boundary_score_rank, num_target_geoms)
## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_score_rank, target_geoms)
The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data
Observatory metadata JSON Object for use in ```OBS_GetData(geomvals,
@@ -213,9 +213,9 @@ Name | Description
---- | -----------
extent | A geometry of the extent of the input geometries
metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column
max_timespan_rank | How many historical time periods to include. Defaults to 1
max_boundary_score_rank | How many alternative boundary levels to include. Defaults to 1
num_target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
num_timespan_options | How many historical time periods to include. Defaults to 1
num_score_options | How many alternative boundary levels to include. Defaults to 1
target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
The schema of the metadata input objects are as follows:
@@ -227,6 +227,10 @@ normalization | The desired normalization. One of 'area', 'prenormalized', or '
denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry.
numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified.
geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified.
target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata, for example if you're passing in points.
target_geoms | Override global `target_geoms` for this element of metadata
max_timespan_rank | Only include timespans of this recency (for example, `1` is only the most recent timespan). No limit by default
max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary). Is `1` by default
#### Returns
@@ -242,9 +246,12 @@ fail.
Metadata Output Key | Description
--- | -----------
suggested_name | A suggested column name for adding this to an existing table
numer_id | Identifier for desired measurement
numer_timespan | Timespan that will be used of the desired measurement
numer_name | Human-readable name of desired measure
numer_description | Long human-readable description of the desired measure
numer_t_description | Further information about the source table
numer_type | PostgreSQL/PostGIS type of desired measure
numer_colname | Internal identifier for column name
numer_tablename | Internal identifier for table
@@ -252,6 +259,8 @@ numer_geomref_colname | Internal identifier for geomref column name
denom_id | Identifier for desired normalization
denom_timespan | Timespan that will be used of the desired normalization
denom_name | Human-readable name of desired measure's normalization
denom_description | Long human-readable description of the desired measure's normalization
denom_t_description | Further information about the source table
denom_type | PostgreSQL/PostGIS type of desired measure's normalization
denom_colname | Internal identifier for normalization column name
denom_tablename | Internal identifier for normalization table
@@ -259,12 +268,14 @@ denom_geomref_colname | Internal identifier for normalization geomref column nam
geom_id | Identifier for desired boundary geometry
geom_timespan | Timespan that will be used of the desired boundary geometry
geom_name | Human-readable name of desired boundary geometry
geom_description | Long human-readable description of the desired boundary geometry
geom_t_description | Further information about the source table
geom_type | PostgreSQL/PostGIS type of desired boundary geometry
geom_colname | Internal identifier for boundary geometry column name
geom_tablename | Internal identifier for boundary geometry table
geom_geomref_colname | Internal identifier for boundary geometry ref column name
timespan_rank | Ranking of this measurement by time, most recent is 1, second most recent 2, etc.
score | The score of this measurement's boundary compared to the `extent` and `num_target_geoms` passed in. Between 0 and 100.
score | The score of this measurement's boundary compared to the `extent` and `target_geoms` passed in. Between 0 and 100.
score_rank | The ranking of this measurement's boundary, highest ranked is 1, second is 2, etc.
numer_aggregate | The aggregate type of the numerator, either `sum`, `average`, `median`, or blank
denom_aggregate | The aggregate type of the denominator, either `sum`, `average`, `median`, or blank

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.5'
default_version = '1.5.1'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.5'
default_version = '1.5.1'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -102,8 +102,8 @@ $$ LANGUAGE plpgsql STABLE;
CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetMeta(
geom geometry(Geometry, 4326),
params JSON,
max_timespan_rank INTEGER DEFAULT NULL, -- cutoff for timespan ranks when there's ambiguity
max_score_rank INTEGER DEFAULT NULL, -- cutoff for geom ranks when there's ambiguity
num_timespan_options INTEGER DEFAULT NULL, -- how many timespan options to show
num_score_options INTEGER DEFAULT NULL, -- how many score options to show
target_geoms INTEGER DEFAULT NULL
)
RETURNS JSON
@@ -115,20 +115,34 @@ DECLARE
scores_clause TEXT;
result JSON;
BEGIN
IF max_timespan_rank IS NULL THEN
max_timespan_rank := 1;
IF num_timespan_options IS NULL THEN
num_timespan_options := 1;
END IF;
IF max_score_rank IS NULL THEN
max_score_rank := 1;
IF num_score_options IS NULL THEN
num_score_options := 1;
END IF;
numer_filters := (SELECT Array_Agg(val) FILTER (WHERE val IS NOT NULL) FROM (SELECT (JSON_Array_Elements(params))->>'numer_id' val) foo);
geom_filters := (SELECT Array_Agg(val) FILTER (WHERE val IS NOT NULL) FROM (SELECT (JSON_Array_Elements(params))->>'geom_id' val) bar);
meta_filter_clause := '(m.numer_id = ANY ($6) OR m.geom_id = ANY ($7))';
scores_clause := 'SELECT *
FROM cdb_observatory._OBS_GetGeometryScores($1,
(SELECT Array_Agg(geom_id) FROM meta), $2) scores ';
scores_clause := ' agg_geoms AS (
SELECT target_geoms, target_area, ARRAY_AGG(geom_id) geom_ids
FROM meta
GROUP BY target_geoms, target_area
), scores AS (
SELECT target_geoms, target_area,
CASE target_area
-- point-specific, just order by numgeoms instead of score
WHEN 0 THEN scores.numgeoms
-- has some area, use proper scoring
ELSE scores.score
END AS score,
scores.numgeoms, scores.table_id, scores.column_id
FROM agg_geoms,
LATERAL cdb_observatory._OBS_GetGeometryScores($1,
geom_ids, COALESCE(target_geoms, $2), target_area) scores
) ';
IF JSON_Array_Length(params) = 1 THEN
IF numer_filters IS NULL AND geom_filters IS NOT NULL THEN
@@ -142,21 +156,22 @@ BEGIN
END IF;
IF geom_filters IS NOT NULL AND numer_filters IS NOT NULL THEN
scores_clause := 'SELECT 1 score, null, geom_tid table_id, geom_id column_id,
null, null, null, null, null, null
FROM meta ';
scores_clause := 'scores AS (
SELECT NULL::INTEGER target_geoms, NULL::Numeric target_area,
1 score, null, geom_tid table_id, geom_id column_id,
NULL::Integer numgeoms
FROM meta) ';
END IF;
END IF;
EXECUTE format($string$
WITH _filters AS (SELECT
generate_series(1, array_length($3, 1)) id,
(unnest($3))->>'numer_id' numer_id,
(unnest($3))->>'denom_id' denom_id,
(unnest($3))->>'geom_id' geom_id,
(unnest($3))->>'numer_timespan' numer_timespan,
(unnest($3))->>'geom_timespan' geom_timespan,
(unnest($3))->>'normalization' normalization
row_number() over () id, *
FROM json_to_recordset($3)
AS x(numer_id TEXT, denom_id TEXT, geom_id TEXT, numer_timespan TEXT,
geom_timespan TEXT, normalization TEXT, max_timespan_rank TEXT,
max_score_rank TEXT, target_geoms INTEGER, target_area Numeric
)
), meta AS (SELECT
id,
f.numer_id,
@@ -166,6 +181,8 @@ BEGIN
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_tablename END numer_tablename,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_type END numer_type,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_name END numer_name,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_description END numer_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_t_description END numer_t_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.numer_timespan END numer_timespan,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.denom_id END denom_id,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_aggregate END denom_aggregate,
@@ -173,6 +190,8 @@ BEGIN
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_geomref_colname END denom_geomref_colname,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_tablename END denom_tablename,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_name END denom_name,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_description END denom_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_t_description END denom_t_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_type END denom_type,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_reltype END denom_reltype,
m.geom_id,
@@ -182,8 +201,24 @@ BEGIN
geom_geomref_colname,
geom_tablename,
geom_name,
geom_description,
geom_t_description,
geom_type,
normalization
Coalesce(normalization,
-- automatically assign normalization to numeric numerators
CASE WHEN cdb_observatory.isnumeric(numer_type) THEN
CASE WHEN denom_reltype ILIKE 'denominator' THEN 'denominated'
WHEN numer_aggregate ILIKE 'sum' THEN 'area'
WHEN numer_aggregate IN ('median', 'average') AND denom_reltype ILIKE 'universe'
THEN 'prenormalized'
ELSE 'prenormalized'
END ELSE NULL
END
) normalization,
max_timespan_rank,
max_score_rank,
target_geoms,
target_area
FROM observatory.obs_meta m JOIN _filters f
ON CASE WHEN f.numer_id IS NULL THEN m.geom_id ELSE m.numer_id END =
CASE WHEN f.numer_id IS NULL THEN f.geom_id ELSE f.numer_id END
@@ -194,9 +229,8 @@ BEGIN
AND (m.geom_id = f.geom_id OR COALESCE(f.geom_id, '') = '')
AND (m.geom_timespan = f.geom_timespan OR COALESCE(f.geom_timespan, '') = '')
AND (m.numer_timespan = f.numer_timespan OR COALESCE(f.numer_timespan, '') = '')
), scores AS (
%s
), groups AS (SELECT
), %s
, groups AS (SELECT
id,
scores.score,
numer_timespan,
@@ -207,45 +241,68 @@ BEGIN
'numer_id', numer_id,
'timespan_rank', dense_rank() OVER (PARTITION BY id ORDER BY numer_timespan DESC),
'score_rank', dense_rank() OVER (PARTITION BY id ORDER BY score DESC),
'timespan_rownum', row_number() over
(PARTITION BY id, score ORDER BY numer_timespan DESC, Coalesce(denom_id, '')),
'score_rownum', row_number() over
(PARTITION BY id, numer_timespan ORDER BY score DESC, Coalesce(denom_id, '')),
'score', scores.score,
'suggested_name', cdb_observatory.FIRST(
LOWER(TRIM(BOTH '_' FROM regexp_replace(CASE WHEN numer_id IS NOT NULL
THEN CASE
WHEN normalization ILIKE 'area%%' THEN numer_colname || ' per sq km'
WHEN normalization ILIKE 'denom%%' THEN numer_colname || ' rate'
ELSE numer_colname
END || ' ' || numer_timespan
ELSE geom_name || ' ' || geom_timespan
END, '[^a-zA-Z0-9]+', '_', 'g')))
),
'numer_aggregate', cdb_observatory.FIRST(meta.numer_aggregate),
'numer_colname', cdb_observatory.FIRST(meta.numer_colname),
'numer_geomref_colname', cdb_observatory.FIRST(meta.numer_geomref_colname),
'numer_tablename', cdb_observatory.FIRST(meta.numer_tablename),
'numer_type', cdb_observatory.FIRST(meta.numer_type),
--'numer_description', cdb_observatory.FIRST(meta.numer_description),
--'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
'numer_description', cdb_observatory.FIRST(meta.numer_description),
'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
'denom_aggregate', cdb_observatory.FIRST(meta.denom_aggregate),
'denom_colname', cdb_observatory.FIRST(denom_colname),
'denom_geomref_colname', cdb_observatory.FIRST(denom_geomref_colname),
'denom_tablename', cdb_observatory.FIRST(denom_tablename),
'denom_type', cdb_observatory.FIRST(meta.denom_type),
'denom_reltype', cdb_observatory.FIRST(meta.denom_reltype),
--'denom_description', cdb_observatory.FIRST(meta.denom_description),
--'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
'denom_description', cdb_observatory.FIRST(meta.denom_description),
'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
'geom_colname', cdb_observatory.FIRST(geom_colname),
'geom_geomref_colname', cdb_observatory.FIRST(geom_geomref_colname),
'geom_tablename', cdb_observatory.FIRST(geom_tablename),
'geom_type', cdb_observatory.FIRST(meta.geom_type),
'geom_timespan', cdb_observatory.FIRST(meta.geom_timespan),
--'geom_description', cdb_observatory.FIRST(meta.geom_description),
--'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
'geom_description', cdb_observatory.FIRST(meta.geom_description),
'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
'numer_timespan', cdb_observatory.FIRST(numer_timespan),
'numer_name', cdb_observatory.FIRST(numer_name),
'denom_name', cdb_observatory.FIRST(denom_name),
'geom_name', cdb_observatory.FIRST(geom_name),
'normalization', cdb_observatory.FIRST(normalization),
'max_timespan_rank', cdb_observatory.FIRST(max_timespan_rank),
'max_score_rank', cdb_observatory.FIRST(max_score_rank),
'target_geoms', cdb_observatory.FIRST(scores.target_geoms),
'target_area', cdb_observatory.FIRST(scores.target_area),
'num_geoms', cdb_observatory.FIRST(scores.numgeoms),
'denom_id', denom_id,
'geom_id', meta.geom_id
) metadata
FROM meta, scores
WHERE meta.geom_id = scores.column_id
AND meta.geom_tid = scores.table_id
AND COALESCE(meta.target_geoms, 0) = COALESCE(scores.target_geoms, 0)
AND COALESCE(meta.target_area, 0) = COALESCE(scores.target_area, 0)
GROUP BY id, score, numer_id, denom_id, geom_id, numer_timespan
) SELECT JSON_AGG(metadata ORDER BY id)
FROM groups
WHERE timespan_rank <= $4
AND score_rank <= $5
WHERE timespan_rank <= Coalesce((metadata->>'max_timespan_rank')::INTEGER, 'infinity'::FLOAT)
AND score_rank <= Coalesce((metadata->>'max_score_rank')::INTEGER, 1)
AND (metadata->>'timespan_rownum')::INTEGER <= $4
AND (metadata->>'score_rownum')::INTEGER <= $5
$string$, meta_filter_clause, scores_clause)
INTO result
USING
@@ -254,9 +311,9 @@ BEGIN
ELSE geom
END,
target_geoms,
(SELECT ARRAY(SELECT json_array_elements_text(params))::json[]),
max_timespan_rank,
max_score_rank, numer_filters, geom_filters
params,
num_timespan_options,
num_score_options, numer_filters, geom_filters
;
RETURN result;
END;
@@ -536,14 +593,9 @@ RETURNS TABLE (
)
AS $$
DECLARE
geom_colspecs TEXT;
geom_tables TEXT;
geomrefs_alias TEXT;
geomrefs_noalias TEXT;
data_colspecs TEXT;
data_tables TEXT;
obs_wheres TEXT;
user_wheres TEXT;
procgeom_clauses TEXT;
val_clauses TEXT;
json_clause TEXT;
geomtype TEXT;
BEGIN
IF params IS NULL OR JSON_ARRAY_LENGTH(params) = 0 OR ARRAY_LENGTH(geomvals, 1) IS NULL THEN
@@ -553,250 +605,233 @@ BEGIN
geomtype := ST_GeometryType(geomvals[1].geom);
EXECUTE
$query$
WITH _meta AS (SELECT
row_number() over () colid,
meta->>'id' id,
meta->>'numer_id' numer_id,
meta->>'numer_aggregate' numer_aggregate,
meta->>'numer_colname' numer_colname,
meta->>'numer_geomref_colname' numer_geomref_colname,
meta->>'numer_tablename' numer_tablename,
meta->>'numer_type' numer_type,
meta->>'denom_id' denom_id,
meta->>'denom_aggregate' denom_aggregate,
meta->>'denom_colname' denom_colname,
meta->>'denom_geomref_colname' denom_geomref_colname,
meta->>'denom_tablename' denom_tablename,
meta->>'denom_type' denom_type,
meta->>'denom_reltype' denom_reltype,
meta->>'geom_id' geom_id,
meta->>'geom_colname' geom_colname,
meta->>'geom_geomref_colname' geom_geomref_colname,
meta->>'geom_tablename' geom_tablename,
meta->>'geom_type' geom_type,
meta->>'numer_timespan' numer_timespan,
meta->>'geom_timespan' geom_timespan,
meta->>'normalization' normalization,
meta->>'api_method' api_method,
meta->'api_args' api_args
FROM UNNEST($1) AS meta
)
/* Read metadata to generate clauses for query */
EXECUTE $query$
WITH _meta AS (SELECT
row_number() over () colid, *
FROM json_to_recordset($1)
AS x(id TEXT, numer_id TEXT, numer_aggregate TEXT, numer_colname TEXT,
numer_geomref_colname TEXT, numer_tablename TEXT, numer_type TEXT,
denom_id TEXT, denom_aggregate TEXT, denom_colname TEXT,
denom_geomref_colname TEXT, denom_tablename TEXT, denom_type TEXT,
denom_reltype TEXT, geom_id TEXT, geom_colname TEXT,
geom_geomref_colname TEXT, geom_tablename TEXT, geom_type TEXT,
numer_timespan TEXT, geom_timespan TEXT, normalization TEXT,
api_method TEXT, api_args JSON)
),
-- Generate procgeom clauses.
-- These join the users' geoms to the relevant geometries for the
-- asked-for measures in the Observatory.
_procgeom_clauses AS (
SELECT
String_Agg(DISTINCT
CASE
-- pass-through geom if user is requesting it only
WHEN numer_id IS NULL AND api_method IS NULL THEN
geom_tablename || '.' || geom_colname || ' AS geom_' || geom_tablename
WHEN cdb_observatory.isnumeric(numer_type) AND api_method IS NULL THEN
-- for numeric points with area normalization, include areas of underlying geoms
CASE
WHEN $2 = 'ST_Point' AND (LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')) THEN
' Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography), 0)/1000000 ' ||
' AS area_' || geom_tablename
-- for numeric areas, include more complex calcs
WHEN $2 != 'ST_Point' THEN
'CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ') ' ||
' THEN ST_Area(_geoms.geom) / Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)' ||
' WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom) ' ||
' THEN 1 ' ||
' ELSE ST_Area(cdb_observatory.safe_intersection(_geoms.geom, ' ||
geom_tablename || '.' || geom_colname || ')) / ' ||
'Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0) ' ||
'END pct_' || geom_tablename
ELSE NULL
END
ELSE NULL END
, ', ') AS geom_colspecs,
String_Agg(DISTINCT 'observatory.' || geom_tablename, ', ') AS geom_tables,
String_Agg(
'JSON_Build_Object(' || CASE
-- api-delivered values
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%' OR
(normalization IS NULL AND LOWER(denom_reltype) LIKE 'denominator')
THEN CASE
-- denominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))'
-- denominated polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / SUM (denom * (% OBS geom in user geom))
ELSE
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- areaNormalized
WHEN LOWER(normalization) LIKE 'area%' OR
(normalization IS NULL AND numer_aggregate ILIKE 'sum')
THEN CASE
-- areaNormalized point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / area_' || geom_tablename || ')'
-- areaNormalized polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / area of big geom
ELSE
--' NULL END '
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / (Nullif(ST_Area(cdb_observatory.FIRST(_procgeoms.geom)::Geography), 0) / 1000000) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- median/average measures with universe
WHEN LOWER(numer_aggregate) IN ('median', 'average') AND
denom_reltype ILIKE 'universe' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation weighted by universe
-- SUM (numer * denom * (% user geom in OBS geom)) / SUM (denom * (% user geom in OBS geom))
-- (10 * 1000 * 1) / (1000 * 1) = 10
-- (10 * 1000 * 1 + 50 * 10 * 1) / (1000 + 10) = 10500 / 10000 = 10.5
' SUM(' || numer_tablename || '.' || numer_colname ||
' * ' || denom_tablename || '.' || denom_colname ||
' * pct_' || geom_tablename ||
' ) / Nullif(SUM(' || denom_tablename || '.' || denom_colname ||
' * pct_' || geom_tablename || '), 0) ' ||
' / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- prenormalized for summable measures. point or summable only!
WHEN numer_aggregate ILIKE 'sum' AND
(normalization IS NULL OR LOWER(normalization) LIKE 'pre%')
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation
-- SUM (numer * (% user geom in OBS geom))
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * pct_' || geom_tablename ||
' ) / (COUNT(*) / COUNT(distinct geomref_' || geom_tablename || ')) '
END
-- Everything else. Point only!
ELSE CASE
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
' cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric '
END
END || '::' || numer_type
-- categorical/text
WHEN LOWER(numer_type) LIKE 'text' THEN
'''value'', ' || 'MODE() WITHIN GROUP (ORDER BY ' || numer_tablename || '.' || numer_colname || ') '
-- geometry
WHEN numer_id IS NULL THEN
'''geomref'', geomref_' || geom_tablename || ', ' ||
'''value'', ' || 'cdb_observatory.FIRST(geom_' || geom_tablename ||
')::TEXT'
-- code below will return the intersection of the user's geom and the
-- OBS geom
--'''value'', ' || 'ST_Union(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT'
ELSE ''
END || ')', ', ')
AS colspecs,
-- geomrefs, used to separate out rows in case we don't want to merge
-- results by user input IDs
--
-- api_method and geom_tablename are interchangeable since when an
-- api_method is passed, geom_tablename is ignored
String_Agg(DISTINCT COALESCE(geom_tablename, api_method) || '.' || geom_geomref_colname ||
' AS geomref_' || COALESCE(geom_tablename, api_method), ', ') AS geomrefs_alias,
String_Agg(DISTINCT 'geomref_' || COALESCE(geom_tablename, api_method)
, ', ') AS geomrefs_noalias,
(SELECT String_Agg(DISTINCT CASE
-- External API
WHEN tablename LIKE 'cdb_observatory.%' THEN
'LATERAL (SELECT * FROM ' || tablename || ') ' ||
REPLACE(split_part(tablename, '(', 1), 'cdb_observatory.', '')
-- Internal obs_ table
ELSE 'observatory.' || tablename
END, ', ') FROM (
SELECT DISTINCT UNNEST(tablenames_ary) tablename FROM (
SELECT ARRAY_AGG(numer_tablename) ||
ARRAY_AGG(denom_tablename) ||
ARRAY_AGG('cdb_observatory.' || api_method || '(_procgeoms.geom' || COALESCE(', ' ||
(SELECT STRING_AGG(REPLACE(val::text, '"', ''''), ', ')
FROM (SELECT json_array_elements(api_args) as val) as vals),
'') || ')')
tablenames_ary
) tablenames_inner
) tablenames_outer) data_tables,
String_Agg(DISTINCT array_to_string(ARRAY[
CASE WHEN numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
'_procgeoms.geomref_' || geom_tablename
ELSE NULL END,
CASE WHEN numer_tablename != denom_tablename
THEN numer_tablename || '.' || numer_geomref_colname || ' = ' ||
denom_tablename || '.' || denom_geomref_colname
ELSE NULL END
], ' AND '),
' AND ') FILTER (WHERE numer_tablename != denom_tablename OR
(numer_tablename IS NOT NULL AND geom_tablename IS NOT NULL)) AS obs_wheres,
String_Agg(DISTINCT 'ST_Intersects(' || geom_tablename || '.' || geom_colname
|| ', _geoms.geom)', ' AND ')
AS user_wheres
'_procgeoms_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' AS (' ||
CASE WHEN api_method IS NULL THEN
'SELECT _geoms.id, ' ||
CASE $3 WHEN True THEN '_geoms.geom'
ELSE geom_tablename || '.' || geom_colname
END || ' AS geom, ' ||
geom_tablename || '.' || geom_geomref_colname || ' AS geomref, ' ||
CASE
WHEN $2 = 'ST_Point' THEN
' Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '::Geography), 0)/1000000 ' ||
' AS area'
-- for numeric areas, include more complex calcs
ELSE
'CASE WHEN ST_Within(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')
THEN ST_Area(_geoms.geom) / Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)
WHEN ST_Within(' || geom_tablename || '.' || geom_colname || ', _geoms.geom)
THEN 1
ELSE ST_Area(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')) /
Nullif(ST_Area(' || geom_tablename || '.' || geom_colname || '), 0)
END pct_obs'
END || '
FROM _geoms, observatory.' || geom_tablename || '
WHERE ST_Intersects(_geoms.geom, ' || geom_tablename || '.' || geom_colname || ')'
-- pass through input geometries for api_method
ELSE 'SELECT _geoms.id, _geoms.geom FROM _geoms'
END ||
') '
AS procgeom_clause
FROM _meta
;
$query$
INTO geom_colspecs, geom_tables, data_colspecs, geomrefs_alias,
geomrefs_noalias, data_tables, obs_wheres, user_wheres
USING (SELECT ARRAY(SELECT json_array_elements_text(params))::json[]), geomtype;
GROUP BY api_method, geom_tablename, geom_geomref_colname, geom_colname
),
-- Generate val clauses.
-- These perform interpolations or other necessary calculations to
-- provide values according to users geometries.
_val_clauses AS (
SELECT
'_vals_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' AS (
SELECT _procgeoms.id, ' ||
String_Agg('json_build_object(' || CASE
-- api-delivered values
WHEN api_method IS NOT NULL THEN
'''value'', ' ||
'ARRAY_AGG( ' ||
api_method || '.' || numer_colname || ')::' || numer_type || '[]'
-- numeric internal values
WHEN cdb_observatory.isnumeric(numer_type) THEN
'''value'', ' || CASE
-- denominated
WHEN LOWER(normalization) LIKE 'denom%'
THEN CASE
WHEN denom_tablename IS NULL THEN ' NULL '
-- denominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / NullIf(' || denom_tablename || '.' || denom_colname || ', 0))'
-- denominated polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / SUM (denom * (% OBS geom in user geom))
ELSE
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs ' ||
' ) / NULLIF(SUM(' || denom_tablename || '.' || denom_colname || ' ' ||
' * _procgeoms.pct_obs), 0) '
END
-- areaNormalized
WHEN LOWER(normalization) LIKE 'area%'
THEN CASE
-- areaNormalized point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname ||
' / _procgeoms.area)'
-- areaNormalized polygon interpolation
-- SUM (numer * (% OBS geom in user geom)) / area of big geom
ELSE
--' NULL END '
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs' ||
' ) / (Nullif(ST_Area(cdb_observatory.FIRST(_procgeoms.geom)::Geography), 0) / 1000000) '
END
-- median/average measures with universe
WHEN LOWER(numer_aggregate) IN ('median', 'average') AND
denom_reltype ILIKE 'universe' AND LOWER(normalization) LIKE 'pre%'
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation weighted by universe
-- SUM (numer * denom * (% user geom in OBS geom)) / SUM (denom * (% user geom in OBS geom))
-- (10 * 1000 * 1) / (1000 * 1) = 10
-- (10 * 1000 * 1 + 50 * 10 * 1) / (1000 + 10) = 10500 / 10000 = 10.5
' SUM(' || numer_tablename || '.' || numer_colname ||
' * ' || denom_tablename || '.' || denom_colname ||
' * _procgeoms.pct_obs ' ||
' ) / Nullif(SUM(' || denom_tablename || '.' || denom_colname ||
' * _procgeoms.pct_obs ' || '), 0) '
END
-- prenormalized for summable measures. point or summable only!
WHEN numer_aggregate ILIKE 'sum' AND LOWER(normalization) LIKE 'pre%'
THEN CASE
-- predenominated point-in-poly
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
-- predenominated polygon interpolation
-- SUM (numer * (% user geom in OBS geom))
' SUM(' || numer_tablename || '.' || numer_colname || ' ' ||
' * _procgeoms.pct_obs) '
END
-- Everything else. Point only!
ELSE CASE
WHEN $2 = 'ST_Point' THEN
' cdb_observatory.FIRST(' || numer_tablename || '.' || numer_colname || ') '
ELSE
' cdb_observatory._OBS_RaiseNotice(''Cannot perform calculation over polygon for ' ||
numer_id || '/' || coalesce(denom_id, '') || '/' || geom_id || '/' || numer_timespan || ''')::Numeric '
END
END || '::' || numer_type
-- categorical/text
WHEN LOWER(numer_type) LIKE 'text' THEN
'''value'', ' || 'MODE() WITHIN GROUP (ORDER BY ' || numer_tablename || '.' || numer_colname || ') '
-- geometry
WHEN numer_id IS NULL THEN
'''geomref'', _procgeoms.geomref, ' ||
'''value'', ' || 'cdb_observatory.FIRST(_procgeoms.geom)::TEXT'
-- code below will return the intersection of the user's geom and the
-- OBS geom
--'''value'', ' || 'ST_Union(cdb_observatory.safe_intersection(_geoms.geom, ' || geom_tablename ||
-- '.' || geom_colname || '))::TEXT'
ELSE ''
END
|| ') val_' || colid, ', ')
|| '
FROM _procgeoms_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) || ' _procgeoms ' ||
Coalesce(', ' || String_Agg(DISTINCT
Coalesce('observatory.' || numer_tablename,
'LATERAL (SELECT * FROM cdb_observatory.' || api_method || '(_procgeoms.geom' || Coalesce(', ' ||
(SELECT STRING_AGG(REPLACE(val::text, '"', ''''), ', ')
FROM (SELECT JSON_Array_Elements(api_args) as val) as vals),
'') || ')) AS ' || api_method)
, ', '), '') ||
Coalesce(' WHERE ' || String_Agg(DISTINCT
'_procgeoms.geomref = ' || numer_tablename || '.' || numer_geomref_colname, ' AND '
), '') ||
CASE $3 WHEN True THEN E'\n GROUP BY _procgeoms.id ORDER BY _procgeoms.id '
ELSE E'\n GROUP BY _procgeoms.id, _procgeoms.geomref
ORDER BY _procgeoms.id, _procgeoms.geomref' END
|| ')'
AS val_clause,
'_vals_' || Coalesce(geom_tablename || '_' || geom_geomref_colname, api_method) AS cte_name
FROM _meta
GROUP BY geom_tablename, geom_geomref_colname, geom_colname, api_method
),
-- Generate clauses necessary to join together val_clauses
_val_joins AS (
SELECT String_Agg(a.cte_name || '.id = ' || b.cte_name || '.id ', ' AND ') val_joins
FROM _val_clauses a, _val_clauses b
WHERE a.cte_name != b.cte_name
AND a.cte_name < b.cte_name
),
-- Generate JSON clause. This puts together vals from val_clauses
_json_clause AS (SELECT
'SELECT ' || cdb_observatory.FIRST(cte_name) || '.id::INT,
Array_to_JSON(ARRAY[' || (SELECT String_Agg('val_' || colid, ', ') FROM _meta) || '])
FROM ' || String_Agg(cte_name, ', ') ||
Coalesce(' WHERE ' || val_joins, '')
AS json_clause
FROM _val_clauses, _val_joins
GROUP BY val_joins
)
SELECT (SELECT String_Agg(procgeom_clause, E',\n ') FROM _procgeom_clauses),
(SELECT String_Agg(val_clause, E',\n ') FROM _val_clauses),
json_clause
FROM _json_clause
$query$ INTO
procgeom_clauses,
val_clauses,
json_clause
USING params, geomtype, merge;
/* Execute query */
RETURN QUERY EXECUTE format($query$
WITH _raw_geoms AS (%s),
_geoms AS (SELECT id,
CASE WHEN (ST_NPoints(geom) > 500)
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.0001)), 3)
CASE WHEN (ST_NPoints(geom) > 1000)
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.00001)), 3)
ELSE geom END geom
FROM _raw_geoms),
_procgeoms AS (SELECT _geoms.id, _geoms.geom %s %s
FROM _geoms %s
%s
)
SELECT _procgeoms.id::INT, Array_to_JSON(ARRAY[%s]::JSON[])
FROM _procgeoms %s
%s
GROUP BY _procgeoms.id %s
ORDER BY _procgeoms.id
$query$, CASE WHEN ARRAY_LENGTH(geomvals, 1) = 1 THEN
' SELECT $1[1].val as id, $1[1].geom as geom '
ELSE
' SELECT val as id, geom FROM UNNEST($1) '
-- procgeom_clauses
%s,
-- val_clauses
%s
-- json_clause
%s
$query$, CASE WHEN ARRAY_LENGTH(geomvals, 1) = 1
THEN ' SELECT $1[1].val as id, $1[1].geom as geom '
ELSE ' SELECT val as id, geom FROM UNNEST($1) '
END,
', ' || NullIf(geomrefs_alias, ''),
', ' || NullIf(geom_colspecs, ''),
', ' || NullIf(geom_tables, ''),
'WHERE ' || NullIf( user_wheres, ''),
data_colspecs, ', ' || NullIf(data_tables, ''),
'WHERE ' || NULLIF(obs_wheres, ''),
CASE WHEN merge IS False THEN ', ' || geomrefs_noalias ELSE '' END)
String_Agg(procgeom_clauses, E',\n '),
String_Agg(val_clauses, E',\n '),
json_clause)
USING geomvals;
RETURN;
END;

View File

@@ -252,6 +252,9 @@ CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableGeometries(
geom_aggregate TEXT,
geom_license TEXT,
geom_source TEXT,
geom_type TEXT,
geom_extra JSONB,
geom_tags JSONB,
valid_numer BOOLEAN,
valid_denom BOOLEAN,
valid_timespan BOOLEAN,
@@ -286,6 +289,9 @@ BEGIN
NULL::TEXT geom_aggregate,
NULL::TEXT license,
NULL::TEXT source,
geom_type::TEXT,
geom_extra::JSONB,
geom_tags::JSONB,
$1 = ANY(numers) valid_numer,
$2 = ANY(denoms) valid_denom,
$3 = ANY(timespans) valid_timespan
@@ -319,6 +325,9 @@ CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableTimespans(
timespan_aggregate TEXT,
timespan_license TEXT,
timespan_source TEXT,
timespan_type TEXT,
timespan_extra JSONB,
timespan_tags JSONB,
valid_numer BOOLEAN,
valid_denom BOOLEAN,
valid_geom BOOLEAN
@@ -343,8 +352,11 @@ BEGIN
timespan_description::TEXT,
timespan_weight::NUMERIC,
NULL::TEXT timespan_aggregate,
NULL::TEXT license,
NULL::TEXT source,
NULL::TEXT timespan_license,
NULL::TEXT timespan_source,
NULL::TEXT timespan_type,
NULL::JSONB timespan_extra,
NULL::JSONB timespan_tags,
$1 = ANY(numers) valid_numer,
$2 = ANY(denoms) valid_denom,
$3 = ANY(geoms) valid_geom_id
@@ -418,7 +430,8 @@ $$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
bounds Geometry(Geometry, 4326) DEFAULT NULL,
filter_geom_ids TEXT[] DEFAULT NULL,
desired_num_geoms INTEGER DEFAULT NULL
desired_num_geoms INTEGER DEFAULT NULL,
desired_area NUMERIC DEFAULT NULL
) RETURNS TABLE (
score NUMERIC,
numtiles BIGINT,
@@ -430,6 +443,8 @@ CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
estnumgeoms NUMERIC,
meanmediansize NUMERIC
) AS $$
DECLARE
num_geoms_multiplier Numeric;
BEGIN
IF desired_num_geoms IS NULL THEN
desired_num_geoms := 3000;
@@ -440,6 +455,18 @@ BEGIN
IF ST_Npoints(bounds) > 10000 THEN
bounds := ST_Envelope(bounds);
END IF;
IF desired_area IS NULL THEN
desired_area := ST_Area(bounds);
END IF;
-- In case of points, desired_area will be 0. We still want an accurate
-- estimate of numgeoms in that case.
IF desired_area = 0 THEN
num_geoms_multiplier := 1;
ELSE
num_geoms_multiplier := Coalesce(desired_area / Nullif(ST_Area(bounds), 0), 1);
END IF;
RETURN QUERY
EXECUTE $string$
WITH clipped_geom AS (
@@ -453,13 +480,11 @@ BEGIN
), clipped_geom_countagg AS (
SELECT column_id, table_id
, BOOL_AND(ST_BandIsNoData(clipped_tile, 1)) nodata
, ST_CountAgg(clipped_tile, 1, False)::Numeric pixels -- -10
FROM clipped_geom
GROUP BY column_id, table_id
), clipped_geom_reagg AS (
SELECT COUNT(*)::BIGINT cnt, a.column_id, a.table_id,
cdb_observatory.FIRST(nodata) first_nodata,
cdb_observatory.FIRST(pixels) first_pixel,
cdb_observatory.FIRST(tile) first_tile,
(ST_SummaryStatsAgg(clipped_tile, 1, False)).sum::Numeric sum_geoms, -- ND
(ST_SummaryStatsAgg(clipped_tile, 2, False)).mean::Numeric / 255 mean_fill --ND
@@ -474,9 +499,8 @@ BEGIN
, (CASE WHEN first_nodata IS FALSE
THEN sum_geoms
ELSE COALESCE(ST_Value(first_tile, 1, ST_PointOnSurface($1)), 0)
* (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0))
* first_pixel) -- -20
END)::Numeric
* (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0)))
END)::Numeric * $4
AS numgeoms
, (CASE WHEN first_nodata IS FALSE
THEN mean_fill
@@ -490,7 +514,7 @@ BEGIN
((100.0 / (1+abs(log(0.0001 + $3) - log(0.0001 + numgeoms::Numeric)))) * percentfill)::Numeric
AS score, *
FROM final
$string$ USING bounds, filter_geom_ids, desired_num_geoms;
$string$ USING bounds, filter_geom_ids, desired_num_geoms, num_geoms_multiplier;
RETURN;
END
$$ LANGUAGE plpgsql IMMUTABLE;

View File

@@ -150,6 +150,18 @@ t|t|t|t|t|t|t|t|t|t|t|t|t|t|t
obs_getmeta_conflicting_metadata
t
(1 row)
obs_getmeta_suggested_name
t
(1 row)
obs_getmeta_suggested_name_implicit_area
t
(1 row)
obs_getmeta_suggested_name_area
t
(1 row)
obs_getmeta_suggested_name_denom
t
(1 row)
obs_getdata_geomval_empty_null
t
(1 row)
@@ -261,3 +273,31 @@ t|t
ary_type|obs_getdata_api_geomrefs_args_string_return
t|t
(1 row)
setseed
(1 row)
bg_sample|bg_max_error|bg_avg_error|bg_min_error
1|t|t|t
2|t|t|t
3|t|t|t
5|t|t|t
10|t|t|t
25|t|t|t
50|t|t|t
100|t|t|t
2085|t|t|t
(9 rows)
tract_sample|tract_max_error|tract_avg_error|tract_min_error
1|t|t|t
2|t|t|t
3|t|t|t
5|t|t|t
10|t|t|t
25|t|t|t
50|t|t|t
100|t|t|t
761|t|t|t
(9 rows)
no_bg_point_error
t
(1 row)

View File

@@ -120,6 +120,9 @@ t
_obs_getavailablegeometries_bg_not_1996
t
(1 row)
_obs_getavailablegeometries_has_boundary_tag
t
(1 row)
_obs_getavailabletimespans_2010_2014_in_all
t
(1 row)
@@ -159,21 +162,36 @@ t
_obs_geometryscores_2500km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_500m_buffer
t
(1 row)
_obs_geometryscores_numgeoms_5km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_50km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_500km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_2500km_buffer
t
(1 row)
column_id|_obs_geometryscores_numgeoms_500m_buffer
us.census.tiger.block_group|2
us.census.tiger.census_tract|1
us.census.tiger.zcta5|0
us.census.tiger.county|0
(4 rows)
column_id|_obs_geometryscores_numgeoms_5km_buffer
us.census.tiger.block_group|244
us.census.tiger.census_tract|78
us.census.tiger.zcta5|9
us.census.tiger.county|0
(4 rows)
column_id|_obs_geometryscores_numgeoms_50km_buffer
us.census.tiger.block_group|10817
us.census.tiger.census_tract|3396
us.census.tiger.zcta5|484
us.census.tiger.county|11
(4 rows)
column_id|_obs_geometryscores_numgeoms_500km_buffer
us.census.tiger.block_group|48567
us.census.tiger.census_tract|15823
us.census.tiger.zcta5|6466
us.census.tiger.county|295
(4 rows)
column_id|_obs_geometryscores_numgeoms_2500km_buffer
us.census.tiger.block_group|165852
us.census.tiger.census_tract|55283
us.census.tiger.zcta5|27046
us.census.tiger.county|2551
(4 rows)
_obs_geometryscores_500km_buffer_50_geoms
t
(1 row)
@@ -186,6 +204,12 @@ t
_obs_geometryscores_500km_buffer_25000_geoms
t
(1 row)
testarea_uses_tract
t
(1 row)
points_use_bg
t
(1 row)
_total_pop_in_legacy_builder_metadata
t
(1 row)

File diff suppressed because one or more lines are too long

View File

@@ -268,7 +268,7 @@ SELECT
(meta->0->>'numer_name') = 'Total Population' numer_name,
(meta->0->>'denom_id') IS NULL denom_id,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'area' normalization
FROM meta;
-- OBS_GetMeta for point completes one partial measure with "best" metadata
@@ -290,7 +290,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for polygon completes one partial measure with "best" metadata
@@ -308,7 +308,7 @@ SELECT
(meta->0->>'numer_name') = 'Total Population' numer_name,
(meta->0->>'denom_id') IS NULL denom_id,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'area' normalization
FROM meta;
-- OBS_GetMeta for polygon completes one partial measure with "best" metadata
@@ -330,13 +330,13 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with "best"
-- metadata, includes geom alternatives if asked
WITH meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01001002"}]', null, 2) meta)
'[{"numer_id": "us.census.acs.B01001002", "max_score_rank": 2}]', null, 2) meta)
SELECT
(meta->0->>'id')::integer = 1 id,
(meta->0->>'numer_id') = 'us.census.acs.B01001002' numer_id,
@@ -352,7 +352,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.block_group' geom_id,
(meta->0->>'normalization') IS NULL normalization,
(meta->0->>'normalization') = 'denominated' normalization,
(meta->1->>'id')::integer = 1 id,
(meta->1->>'numer_id') = 'us.census.acs.B01001002' numer_id,
(meta->1->>'timespan_rank')::integer = 1 timespan_rank,
@@ -367,7 +367,7 @@ SELECT
(meta->1->>'denom_type') = 'Numeric' denom_type,
(meta->1->>'denom_name') = 'Total Population' denom_name,
(meta->1->>'geom_id') = 'us.census.tiger.census_tract' geom_id,
(meta->1->>'normalization') IS NULL normalization
(meta->1->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with "best" metadata
@@ -389,7 +389,7 @@ SELECT
(meta->0->>'denom_type') = 'Numeric' denom_type,
(meta->0->>'denom_name') = 'Total Population' denom_name,
(meta->0->>'geom_id') = 'us.census.tiger.census_tract' geom_id,
(meta->0->>'normalization') IS NULL normalization
(meta->0->>'normalization') = 'denominated' normalization
FROM meta;
-- OBS_GetMeta for point completes several partial measures with conflicting
@@ -398,6 +398,26 @@ SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01001002", "denom_id": "us.census.acs.B01001002", "geom_id": "us.census.tiger.census_tract"}]') IS NULL
AS obs_getmeta_conflicting_metadata;
-- OBS_GetMeta provides suggested name for simple meta request
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom"}]'
)->0->>'suggested_name' = 'total_pop_2010_2014' obs_getmeta_suggested_name;
-- OBS_GetMeta provides suggested name for simple meta request with area norm
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001"}]'
)->0->>'suggested_name' = 'total_pop_per_sq_km_2010_2014' obs_getmeta_suggested_name_implicit_area;
-- OBS_GetMeta provides suggested name for simple meta request with area norm
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01003001", "normalization": "area"}]'
)->0->>'suggested_name' = 'total_pop_per_sq_km_2010_2014' obs_getmeta_suggested_name_area;
-- OBS_GetMeta provides suggested name for simple meta request with denom
SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
'[{"numer_id": "us.census.acs.B01001002", "normalization": "denom"}]'
)->0->>'suggested_name' = 'male_pop_rate_2010_2014' obs_getmeta_suggested_name_denom;
-- OBS_GetData/OBS_GetMeta by id with empty list/null
WITH data AS (SELECT * FROM cdb_observatory.OBS_GetData(ARRAY[]::TEXT[], null))
SELECT ARRAY_AGG(data) IS NULL AS obs_getdata_geomval_empty_null FROM data;
@@ -662,25 +682,25 @@ FROM data;
-- OBS_GetData/OBS_GetMeta by geom with polygons inside a polygon + one measure
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group"}]') meta),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "geom_id": "us.census.tiger.block_group"}]') meta),
data AS (SELECT * FROM cdb_observatory.OBS_GetData(
ARRAY[(cdb_observatory._TestArea(), 1)::geomval],
(SELECT meta FROM meta), false))
SELECT every(id = 1) is TRUE id,
count(distinct (data->0->>'value')::geometry) = 16 correct_num_geoms,
abs(sum((data->1->>'value')::numeric) - 15787) / 15787 < 0.001 correct_pop
abs(sum((data->1->>'value')::numeric) - 12327) / 12327 < 0.001 correct_pop
FROM data;
-- OBS_GetData/OBS_GetMeta by geom with polygons inside a polygon + one measure + one text
WITH
meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestArea(),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.tiger.name", "geom_id": "us.census.tiger.block_group"}]') meta),
'[{"geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "geom_id": "us.census.tiger.block_group"}, {"numer_id": "us.census.tiger.name", "geom_id": "us.census.tiger.block_group"}]') meta),
data AS (SELECT * FROM cdb_observatory.OBS_GetData(
ARRAY[(cdb_observatory._TestArea(), 1)::geomval],
(SELECT meta FROM meta), false))
SELECT every(id = 1) is TRUE id,
count(distinct (data->0->>'value')::geometry) = 16 correct_num_geoms,
abs(sum((data->1->>'value')::numeric) - 15787) / 15787 < 0.001 correct_pop,
abs(sum((data->1->>'value')::numeric) - 12327) / 12327 < 0.001 correct_pop,
array_agg(distinct data->2->>'value') = '{"Block Group 1","Block Group 2","Block Group 3","Block Group 4","Block Group 5"}' correct_bg_names
FROM data;
@@ -798,3 +818,146 @@ SELECT json_typeof(data->0->'value') = 'array' ary_type,
AS OBS_GetData_API_geomrefs_args_string_return
FROM cdb_observatory.obs_getdata(array['36047'],
'[{"numer_type": "text", "numer_colname": "obs_getboundarybyid", "api_method": "obs_getboundarybyid", "api_args": ["us.census.tiger.county"]}]');
-- Ensure consistent results below.
select setseed(0);
-- Check that random assortment of block groups in Brooklyn return accurate data
WITH _geoms AS (
SELECT
(data->0->>'value')::geometry the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.block_group"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
ORDER BY RANDOM()
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT sample bg_sample
, MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_max_error
, AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_avg_error
, MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_min_error
FROM geoms, results
WHERE cartodb_id = id
GROUP BY sample
ORDER BY sample
;
-- Check that random assortment of tracts in Brooklyn return accurate data
WITH _geoms AS (
SELECT
(data->0->>'value')::geometry the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.census_tract"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
ORDER BY RANDOM()
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT sample tract_sample
, MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_max_error
, AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_avg_error
, MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_min_error
FROM geoms, results
WHERE cartodb_id = id
GROUP BY sample
ORDER BY sample
;
-- Check that random assortment of block group points in Brooklyn return accurate data
WITH _geoms AS (
SELECT
ST_PointOnSurface((data->0->>'value')::geometry) the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.block_group"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT
BOOL_AND(abs((geoms.total_pop - val) /
Coalesce(NullIf(total_pop, 0), 1)) = 0) is True no_bg_point_error
FROM geoms, results
WHERE cartodb_id = id
;

View File

@@ -289,6 +289,11 @@ FROM cdb_observatory.OBS_GetAvailableGeometries(
) WHERE valid_timespan = True)
AS _obs_getavailablegeometries_bg_not_1996;
SELECT 'subsection/tags.boundary' IN (SELECT (Jsonb_Each(geom_tags)).key
FROM cdb_observatory.OBS_GetAvailableGeometries(
ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)
)) AS _obs_getavailablegeometries_has_boundary_tag;
--
-- OBS_GetAvailableTimespans tests
--
@@ -360,9 +365,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.county', 'us.census.tiger.zcta5'])
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.county', 'us.census.tiger.zcta5']
AS _obs_geometryscores_5km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
@@ -390,60 +395,55 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
'us.census.tiger.census_tract', 'us.census.tiger.block_group']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.block_group']
AS _obs_geometryscores_2500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.block_group'])
WHERE table_id LIKE '%2015%';
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text
= '{ "us.census.tiger.block_group" : 9, "us.census.tiger.census_tract" : 3, "us.census.tiger.zcta5" : 0, "us.census.tiger.county" : 0 }'
AS _obs_geometryscores_numgeoms_500m_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500m_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 880, "us.census.tiger.census_tract" : 310, "us.census.tiger.zcta5" : 45, "us.census.tiger.county" : 1 }'
AS _obs_geometryscores_numgeoms_5km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_5km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 11531, "us.census.tiger.census_tract" : 3601, "us.census.tiger.zcta5" : 550, "us.census.tiger.county" : 14 }'
AS _obs_geometryscores_numgeoms_50km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_50km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 48917, "us.census.tiger.census_tract" : 15969, "us.census.tiger.zcta5" : 6534, "us.census.tiger.county" : 314 }'
AS _obs_geometryscores_numgeoms_500km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 169191, "us.census.tiger.census_tract" : 56469, "us.census.tiger.zcta5" : 26525, "us.census.tiger.county" : 2753 }'
AS _obs_geometryscores_numgeoms_2500km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_2500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
@@ -475,9 +475,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'], 2500)
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.county', 'us.census.tiger.zcta5']
AS _obs_geometryscores_500km_buffer_25000_geoms
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
@@ -485,6 +485,44 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'], 25000)
WHERE table_id LIKE '%2015%';
-- Check that one small geom approximates tract data
WITH geoms AS (SELECT cdb_observatory._testarea() the_geom),
summary AS (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM geoms)
SELECT column_id = 'us.census.tiger.census_tract' testarea_uses_tract
FROM summary, LATERAL (
SELECT *
FROM cdb_observatory._OBS_GetGeometryScores(extent,
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'],
cnt, sumarea)) foo
ORDER BY score DESC LIMIT 1;
-- Check that randomly distributed points always use smallest geometry if we
-- order by numgeoms desc
WITH geoms as (SELECT UNNEST(ARRAY[
cdb_observatory._testpoint(),
st_translate(cdb_observatory._testpoint(), -0.003, 0),
st_translate(cdb_observatory._testpoint(), -0.006, 0)
]) the_geom),
summary as (SELECT
ST_SetSRID(ST_Extent(the_geom), 4326) extent,
SUM(ST_Area(the_geom))::Numeric area,
COUNT(*)::INTEGER cnt
FROM geoms
)
SELECT column_id = 'us.census.tiger.block_group' points_use_bg
FROM summary, LATERAL (
SELECT * FROM cdb_observatory._OBS_GetGeometryScores(
extent,
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'],
cnt, area)) foo
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC LIMIT 1;
--
-- OBS_LegacyBuilderMetadata tests
--

View File

@@ -73,6 +73,7 @@ SKIP_COLUMNS = set([
, 'uk.ons.LC3204WA0004'
, 'uk.ons.LC3204WA0007'
, 'uk.ons.LC3204WA0010'
, 'br.geo.subdistritos_name'
])
MEASURE_COLUMNS = query('''