9 Commits
1.3.4 ... 1.4.0

Author SHA1 Message Date
John Krauss
536af5e4a2 release artifact 2017-03-22 15:17:19 +00:00
John Krauss
ebf23d2a23 Merge branch 'develop' into release-v-1.4.0 2017-03-22 15:16:35 +00:00
John Krauss
f1afcf0d8e update NEWS.md 2017-03-22 15:14:35 +00:00
John Krauss
3c0b40cf3f more consistent arguments in docs 2017-03-22 15:12:50 +00:00
John Krauss
8a87dc7e9a update NEWS.md 2017-03-21 21:24:50 +00:00
John Krauss
61552adba4 Allow for target_geoms and target_area override on column-by-column basis 2017-03-21 17:26:02 +00:00
csobier
36abbee64f Merge pull request #274 from CartoDB/273-docs-edit
clarification of docs for obs_getboundariesbygeometry function
2017-03-17 12:07:48 -04:00
csobier
5a76a7381e clarification of docs for obs_getboundariesbygeometry function 2017-03-17 11:45:49 -04:00
John Krauss
217ca2d84d release 1.3.5 artifact 2017-03-15 20:12:06 +00:00
14 changed files with 5800 additions and 950 deletions

24
NEWS.md
View File

@@ -1,3 +1,27 @@
1.4.0 (2017-03-21)
__API Changes__
* Allow for override of `target_area` and `target_geoms` in `OBS_GetMeta`
([#276](https://github.com/CartoDB/observatory-extension/pull/265)). This
allows the interface to work with points and sparse areas much btter.
* Allow for override of `max_timespan_rank` and `max_score_rank` on an
item-by-item basis for metadata.
* `numer_description`, `geom_description`, `denom_description`,
`numer_t_description`, `denom_t_description` and `geom_t_description` now
returned as part of `OBS_GetMeta`.
__Improvements__
* Reduced amount of simplification done on input geometries (from 0.0001 above
500 points to 0.00001 above 1000 points).
* Added tests to confirm that accurate results are returned from automatic
boundary selection
1.3.5 (2017-03-15)
No changes. Artifact to allow for data update.
1.3.4 (2017-03-10)
__Bugfixes__

View File

@@ -4,7 +4,7 @@ Use the following functions to retrieve [Boundary](https://carto.com/docs/carto-
You can [access](https://carto.com/docs/carto-engine/data/accessing) boundaries through CARTO Builder. The same methods will work if you are using the CARTO Engine to develop your application. We [encourage you](http://docs/carto-engine/data/accessing/#best-practices) to use table modifying methods (UPDATE and INSERT) over dynamic methods (SELECT).
## OBS_GetBoundariesByGeometry(polygon geometry, geometry_id text)
## OBS_GetBoundariesByGeometry(geom geometry, geometry_id text)
The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a set of boundary geometries that intersect a supplied geometry. This can be used to find all boundaries that are within or overlap a bounding box. You have the ability to choose whether to retrieve all boundaries that intersect your supplied bounding box or only those that fall entirely inside of your bounding box.
@@ -12,7 +12,7 @@ The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a se
Name |Description
--- | ---
polygon | a bounding box or other WGS84 geometry
geom | a WGS84 geometry
geometry_id | a string identifier for a boundary geometry
timespan (optional) | year(s) to request from ('NULL' (default) gives most recent)
overlap_type (optional) | one of '[intersects](http://postgis.net/docs/manual-2.2/ST_Intersects.html)' (default), '[contains](http://postgis.net/docs/manual-2.2/ST_Contains.html)', or '[within](http://postgis.net/docs/manual-2.2/ST_Within.html)'.
@@ -26,7 +26,7 @@ Column Name | Description
the_geom | a boundary geometry (e.g., US Census tract boundaries)
geom_refs | a string identifier for the geometry (e.g., geoids of US Census tracts)
If geometries are not found for the requested `polygon`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.
If geometries are not found for the requested `geom`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.
#### Example
@@ -44,7 +44,6 @@ FROM OBS_GetBoundariesByGeometry(
#### Errors
* If a geometry other than a point is passed as the first argument, an error is thrown: `Invalid geometry type (ST_Polygon), expecting 'ST_Point'`
* If an `overlap_type` other than the valid ones listed above is entered, then an error is thrown
## OBS_GetPointsByGeometry(polygon geometry, geometry_id text)

View File

@@ -196,7 +196,7 @@ UPDATE tablename
SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55')
```
## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_boundary_score_rank, num_target_geoms)
## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_score_rank, target_geoms)
The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data
Observatory metadata JSON Object for use in ```OBS_GetData(geomvals,
@@ -215,7 +215,7 @@ extent | A geometry of the extent of the input geometries
metadata | A JSON array composed of metadata input objects. Each indicates one desired measure for an output column, and optionally additional parameters about that column
max_timespan_rank | How many historical time periods to include. Defaults to 1
max_boundary_score_rank | How many alternative boundary levels to include. Defaults to 1
num_target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
target_geoms | Target number of geometries. Boundaries with close to this many objects within `extent` will be ranked highest.
The schema of the metadata input objects are as follows:
@@ -227,6 +227,10 @@ normalization | The desired normalization. One of 'area', 'prenormalized', or '
denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'. Will be automatically assigned if necessary. Ignored if this metadata object specifies a geometry.
numer_timespan | The desired timespan for the measurement. Defaults to most recent timespan available if left unspecified.
geom_timespan | The desired timespan for the geometry. Defaults to timespan matching numer_timespan if left unspecified.
target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area. Unit is square degrees WGS84. Set this to `0` if you want to use the smallest source geometry for this element of metadata, for example if you're passing in points.
target_geoms | Override global `target_geoms` for this element of metadata
max_timespan_rank | Override global `max_timespan_rank` for this element of metadata
max_score_rank | Override global `max_score_rank` for this element of metadata
#### Returns
@@ -245,6 +249,8 @@ Metadata Output Key | Description
numer_id | Identifier for desired measurement
numer_timespan | Timespan that will be used of the desired measurement
numer_name | Human-readable name of desired measure
numer_description | Long human-readable description of the desired measure
numer_t_description | Further information about the source table
numer_type | PostgreSQL/PostGIS type of desired measure
numer_colname | Internal identifier for column name
numer_tablename | Internal identifier for table
@@ -252,6 +258,8 @@ numer_geomref_colname | Internal identifier for geomref column name
denom_id | Identifier for desired normalization
denom_timespan | Timespan that will be used of the desired normalization
denom_name | Human-readable name of desired measure's normalization
denom_description | Long human-readable description of the desired measure's normalization
denom_t_description | Further information about the source table
denom_type | PostgreSQL/PostGIS type of desired measure's normalization
denom_colname | Internal identifier for normalization column name
denom_tablename | Internal identifier for normalization table
@@ -259,12 +267,14 @@ denom_geomref_colname | Internal identifier for normalization geomref column nam
geom_id | Identifier for desired boundary geometry
geom_timespan | Timespan that will be used of the desired boundary geometry
geom_name | Human-readable name of desired boundary geometry
geom_description | Long human-readable description of the desired boundary geometry
geom_t_description | Further information about the source table
geom_type | PostgreSQL/PostGIS type of desired boundary geometry
geom_colname | Internal identifier for boundary geometry column name
geom_tablename | Internal identifier for boundary geometry table
geom_geomref_colname | Internal identifier for boundary geometry ref column name
timespan_rank | Ranking of this measurement by time, most recent is 1, second most recent 2, etc.
score | The score of this measurement's boundary compared to the `extent` and `num_target_geoms` passed in. Between 0 and 100.
score | The score of this measurement's boundary compared to the `extent` and `target_geoms` passed in. Between 0 and 100.
score_rank | The ranking of this measurement's boundary, highest ranked is 1, second is 2, etc.
numer_aggregate | The aggregate type of the numerator, either `sum`, `average`, `median`, or blank
denom_aggregate | The aggregate type of the denominator, either `sum`, `average`, `median`, or blank

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.4'
default_version = '1.4.0'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -1,5 +1,5 @@
comment = 'CartoDB Observatory backend extension'
default_version = '1.3.4'
default_version = '1.4.0'
requires = 'postgis'
superuser = true
schema = cdb_observatory

View File

@@ -126,9 +126,23 @@ BEGIN
geom_filters := (SELECT Array_Agg(val) FILTER (WHERE val IS NOT NULL) FROM (SELECT (JSON_Array_Elements(params))->>'geom_id' val) bar);
meta_filter_clause := '(m.numer_id = ANY ($6) OR m.geom_id = ANY ($7))';
scores_clause := 'SELECT *
FROM cdb_observatory._OBS_GetGeometryScores($1,
(SELECT Array_Agg(geom_id) FROM meta), $2) scores ';
scores_clause := ' agg_geoms AS (
SELECT target_geoms, target_area, ARRAY_AGG(geom_id) geom_ids
FROM meta
GROUP BY target_geoms, target_area
), scores AS (
SELECT target_geoms, target_area,
CASE target_area
-- point-specific, just order by numgeoms instead of score
WHEN 0 THEN scores.numgeoms
-- has some area, use proper scoring
ELSE scores.score
END AS score,
scores.numgeoms, scores.table_id, scores.column_id
FROM agg_geoms,
LATERAL cdb_observatory._OBS_GetGeometryScores($1,
geom_ids, COALESCE(target_geoms, $2), target_area) scores
) ';
IF JSON_Array_Length(params) = 1 THEN
IF numer_filters IS NULL AND geom_filters IS NOT NULL THEN
@@ -142,9 +156,11 @@ BEGIN
END IF;
IF geom_filters IS NOT NULL AND numer_filters IS NOT NULL THEN
scores_clause := 'SELECT 1 score, null, geom_tid table_id, geom_id column_id,
null, null, null, null, null, null
FROM meta ';
scores_clause := 'scores AS (
SELECT NULL::INTEGER target_geoms, NULL::Numeric target_area,
1 score, null, geom_tid table_id, geom_id column_id,
NULL::Integer numgeoms
FROM meta) ';
END IF;
END IF;
@@ -156,7 +172,11 @@ BEGIN
(unnest($3))->>'geom_id' geom_id,
(unnest($3))->>'numer_timespan' numer_timespan,
(unnest($3))->>'geom_timespan' geom_timespan,
(unnest($3))->>'normalization' normalization
(unnest($3))->>'normalization' normalization,
(unnest($3))->>'max_timespan_rank' max_timespan_rank,
(unnest($3))->>'max_score_rank' max_score_rank,
((unnest($3))->>'target_geoms')::INTEGER target_geoms,
((unnest($3))->>'target_area')::Numeric target_area
), meta AS (SELECT
id,
f.numer_id,
@@ -166,6 +186,8 @@ BEGIN
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_tablename END numer_tablename,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_type END numer_type,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_name END numer_name,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_description END numer_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_t_description END numer_t_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.numer_timespan END numer_timespan,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.denom_id END denom_id,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_aggregate END denom_aggregate,
@@ -173,6 +195,8 @@ BEGIN
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_geomref_colname END denom_geomref_colname,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_tablename END denom_tablename,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_name END denom_name,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_description END denom_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_t_description END denom_t_description,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_type END denom_type,
CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_reltype END denom_reltype,
m.geom_id,
@@ -182,8 +206,14 @@ BEGIN
geom_geomref_colname,
geom_tablename,
geom_name,
geom_description,
geom_t_description,
geom_type,
normalization
normalization,
max_timespan_rank,
max_score_rank,
target_geoms,
target_area
FROM observatory.obs_meta m JOIN _filters f
ON CASE WHEN f.numer_id IS NULL THEN m.geom_id ELSE m.numer_id END =
CASE WHEN f.numer_id IS NULL THEN f.geom_id ELSE f.numer_id END
@@ -194,9 +224,8 @@ BEGIN
AND (m.geom_id = f.geom_id OR COALESCE(f.geom_id, '') = '')
AND (m.geom_timespan = f.geom_timespan OR COALESCE(f.geom_timespan, '') = '')
AND (m.numer_timespan = f.numer_timespan OR COALESCE(f.numer_timespan, '') = '')
), scores AS (
%s
), groups AS (SELECT
), %s
, groups AS (SELECT
id,
scores.score,
numer_timespan,
@@ -213,39 +242,46 @@ BEGIN
'numer_geomref_colname', cdb_observatory.FIRST(meta.numer_geomref_colname),
'numer_tablename', cdb_observatory.FIRST(meta.numer_tablename),
'numer_type', cdb_observatory.FIRST(meta.numer_type),
--'numer_description', cdb_observatory.FIRST(meta.numer_description),
--'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
'numer_description', cdb_observatory.FIRST(meta.numer_description),
'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
'denom_aggregate', cdb_observatory.FIRST(meta.denom_aggregate),
'denom_colname', cdb_observatory.FIRST(denom_colname),
'denom_geomref_colname', cdb_observatory.FIRST(denom_geomref_colname),
'denom_tablename', cdb_observatory.FIRST(denom_tablename),
'denom_type', cdb_observatory.FIRST(meta.denom_type),
'denom_reltype', cdb_observatory.FIRST(meta.denom_reltype),
--'denom_description', cdb_observatory.FIRST(meta.denom_description),
--'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
'denom_description', cdb_observatory.FIRST(meta.denom_description),
'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
'geom_colname', cdb_observatory.FIRST(geom_colname),
'geom_geomref_colname', cdb_observatory.FIRST(geom_geomref_colname),
'geom_tablename', cdb_observatory.FIRST(geom_tablename),
'geom_type', cdb_observatory.FIRST(meta.geom_type),
'geom_timespan', cdb_observatory.FIRST(meta.geom_timespan),
--'geom_description', cdb_observatory.FIRST(meta.geom_description),
--'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
'geom_description', cdb_observatory.FIRST(meta.geom_description),
'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
'numer_timespan', cdb_observatory.FIRST(numer_timespan),
'numer_name', cdb_observatory.FIRST(numer_name),
'denom_name', cdb_observatory.FIRST(denom_name),
'geom_name', cdb_observatory.FIRST(geom_name),
'normalization', cdb_observatory.FIRST(normalization),
'max_timespan_rank', cdb_observatory.FIRST(max_timespan_rank),
'max_score_rank', cdb_observatory.FIRST(max_score_rank),
'target_geoms', cdb_observatory.FIRST(scores.target_geoms),
'target_area', cdb_observatory.FIRST(scores.target_area),
'num_geoms', cdb_observatory.FIRST(scores.numgeoms),
'denom_id', denom_id,
'geom_id', meta.geom_id
) metadata
FROM meta, scores
WHERE meta.geom_id = scores.column_id
AND meta.geom_tid = scores.table_id
AND COALESCE(meta.target_geoms, 0) = COALESCE(scores.target_geoms, 0)
AND COALESCE(meta.target_area, 0) = COALESCE(scores.target_area, 0)
GROUP BY id, score, numer_id, denom_id, geom_id, numer_timespan
) SELECT JSON_AGG(metadata ORDER BY id)
FROM groups
WHERE timespan_rank <= $4
AND score_rank <= $5
WHERE timespan_rank <= Coalesce((metadata->>'max_timespan_rank')::INTEGER, $4)
AND score_rank <= Coalesce((metadata->>'max_score_rank')::INTEGER, $5)
$string$, meta_filter_clause, scores_clause)
INTO result
USING
@@ -772,8 +808,8 @@ BEGIN
RETURN QUERY EXECUTE format($query$
WITH _raw_geoms AS (%s),
_geoms AS (SELECT id,
CASE WHEN (ST_NPoints(geom) > 500)
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.0001)), 3)
CASE WHEN (ST_NPoints(geom) > 1000)
THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.00001)), 3)
ELSE geom END geom
FROM _raw_geoms),
_procgeoms AS (SELECT _geoms.id, _geoms.geom %s %s

View File

@@ -418,7 +418,8 @@ $$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
bounds Geometry(Geometry, 4326) DEFAULT NULL,
filter_geom_ids TEXT[] DEFAULT NULL,
desired_num_geoms INTEGER DEFAULT NULL
desired_num_geoms INTEGER DEFAULT NULL,
desired_area NUMERIC DEFAULT NULL
) RETURNS TABLE (
score NUMERIC,
numtiles BIGINT,
@@ -430,6 +431,8 @@ CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
estnumgeoms NUMERIC,
meanmediansize NUMERIC
) AS $$
DECLARE
num_geoms_multiplier Numeric;
BEGIN
IF desired_num_geoms IS NULL THEN
desired_num_geoms := 3000;
@@ -440,6 +443,18 @@ BEGIN
IF ST_Npoints(bounds) > 10000 THEN
bounds := ST_Envelope(bounds);
END IF;
IF desired_area IS NULL THEN
desired_area := ST_Area(bounds);
END IF;
-- In case of points, desired_area will be 0. We still want an accurate
-- estimate of numgeoms in that case.
IF desired_area = 0 THEN
num_geoms_multiplier := 1;
ELSE
num_geoms_multiplier := Coalesce(desired_area / Nullif(ST_Area(bounds), 0), 1);
END IF;
RETURN QUERY
EXECUTE $string$
WITH clipped_geom AS (
@@ -453,13 +468,11 @@ BEGIN
), clipped_geom_countagg AS (
SELECT column_id, table_id
, BOOL_AND(ST_BandIsNoData(clipped_tile, 1)) nodata
, ST_CountAgg(clipped_tile, 1, False)::Numeric pixels -- -10
FROM clipped_geom
GROUP BY column_id, table_id
), clipped_geom_reagg AS (
SELECT COUNT(*)::BIGINT cnt, a.column_id, a.table_id,
cdb_observatory.FIRST(nodata) first_nodata,
cdb_observatory.FIRST(pixels) first_pixel,
cdb_observatory.FIRST(tile) first_tile,
(ST_SummaryStatsAgg(clipped_tile, 1, False)).sum::Numeric sum_geoms, -- ND
(ST_SummaryStatsAgg(clipped_tile, 2, False)).mean::Numeric / 255 mean_fill --ND
@@ -474,9 +487,8 @@ BEGIN
, (CASE WHEN first_nodata IS FALSE
THEN sum_geoms
ELSE COALESCE(ST_Value(first_tile, 1, ST_PointOnSurface($1)), 0)
* (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0))
* first_pixel) -- -20
END)::Numeric
* (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0)))
END)::Numeric * $4
AS numgeoms
, (CASE WHEN first_nodata IS FALSE
THEN mean_fill
@@ -490,7 +502,7 @@ BEGIN
((100.0 / (1+abs(log(0.0001 + $3) - log(0.0001 + numgeoms::Numeric)))) * percentfill)::Numeric
AS score, *
FROM final
$string$ USING bounds, filter_geom_ids, desired_num_geoms;
$string$ USING bounds, filter_geom_ids, desired_num_geoms, num_geoms_multiplier;
RETURN;
END
$$ LANGUAGE plpgsql IMMUTABLE;

View File

@@ -261,3 +261,31 @@ t|t
ary_type|obs_getdata_api_geomrefs_args_string_return
t|t
(1 row)
setseed
(1 row)
bg_sample|bg_max_error|bg_avg_error|bg_min_error
1|t|t|t
2|t|t|t
3|t|t|t
5|t|t|t
10|t|t|t
25|t|t|t
50|t|t|t
100|t|t|t
2085|t|t|t
(9 rows)
tract_sample|tract_max_error|tract_avg_error|tract_min_error
1|t|t|t
2|t|t|t
3|t|t|t
5|t|t|t
10|t|t|t
25|t|t|t
50|t|t|t
100|t|t|t
761|t|t|t
(9 rows)
no_bg_point_error
t
(1 row)

View File

@@ -159,21 +159,36 @@ t
_obs_geometryscores_2500km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_500m_buffer
t
(1 row)
_obs_geometryscores_numgeoms_5km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_50km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_500km_buffer
t
(1 row)
_obs_geometryscores_numgeoms_2500km_buffer
t
(1 row)
column_id|_obs_geometryscores_numgeoms_500m_buffer
us.census.tiger.block_group|2
us.census.tiger.census_tract|1
us.census.tiger.zcta5|0
us.census.tiger.county|0
(4 rows)
column_id|_obs_geometryscores_numgeoms_5km_buffer
us.census.tiger.block_group|244
us.census.tiger.census_tract|78
us.census.tiger.zcta5|9
us.census.tiger.county|0
(4 rows)
column_id|_obs_geometryscores_numgeoms_50km_buffer
us.census.tiger.block_group|10817
us.census.tiger.census_tract|3396
us.census.tiger.zcta5|484
us.census.tiger.county|11
(4 rows)
column_id|_obs_geometryscores_numgeoms_500km_buffer
us.census.tiger.block_group|48567
us.census.tiger.census_tract|15823
us.census.tiger.zcta5|6466
us.census.tiger.county|295
(4 rows)
column_id|_obs_geometryscores_numgeoms_2500km_buffer
us.census.tiger.block_group|165852
us.census.tiger.census_tract|55283
us.census.tiger.zcta5|27046
us.census.tiger.county|2551
(4 rows)
_obs_geometryscores_500km_buffer_50_geoms
t
(1 row)
@@ -186,6 +201,12 @@ t
_obs_geometryscores_500km_buffer_25000_geoms
t
(1 row)
testarea_uses_tract
t
(1 row)
points_use_bg
t
(1 row)
_total_pop_in_legacy_builder_metadata
t
(1 row)

File diff suppressed because one or more lines are too long

View File

@@ -798,3 +798,146 @@ SELECT json_typeof(data->0->'value') = 'array' ary_type,
AS OBS_GetData_API_geomrefs_args_string_return
FROM cdb_observatory.obs_getdata(array['36047'],
'[{"numer_type": "text", "numer_colname": "obs_getboundarybyid", "api_method": "obs_getboundarybyid", "api_args": ["us.census.tiger.county"]}]');
-- Ensure consistent results below.
select setseed(0);
-- Check that random assortment of block groups in Brooklyn return accurate data
WITH _geoms AS (
SELECT
(data->0->>'value')::geometry the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.block_group"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
ORDER BY RANDOM()
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT sample bg_sample
, MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_max_error
, AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_avg_error
, MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_min_error
FROM geoms, results
WHERE cartodb_id = id
GROUP BY sample
ORDER BY sample
;
-- Check that random assortment of tracts in Brooklyn return accurate data
WITH _geoms AS (
SELECT
(data->0->>'value')::geometry the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.census_tract"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
ORDER BY RANDOM()
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT sample tract_sample
, MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_max_error
, AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_avg_error
, MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_min_error
FROM geoms, results
WHERE cartodb_id = id
GROUP BY sample
ORDER BY sample
;
-- Check that random assortment of block group points in Brooklyn return accurate data
WITH _geoms AS (
SELECT
ST_PointOnSurface((data->0->>'value')::geometry) the_geom,
data->0->>'geomref' geom_ref,
(data->1->>'value')::numeric total_pop
FROM cdb_observatory.OBS_GetData(
array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
(SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
'[{"geom_id": "us.census.tiger.block_group"},
{"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
FALSE
)
WHERE data->0->>'geomref' LIKE '36047%'
), geoms AS (
SELECT *, row_number() OVER () cartodb_id FROM _geoms
), samples AS (
SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
), filtered AS (
SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
), summary AS (
SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM filtered
GROUP BY sample
), meta AS (
SELECT sample, cdb_observatory.OBS_GetMeta(extent,
('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
1, 1, cnt) meta
FROM summary
GROUP BY sample, extent, cnt, sumarea
), results AS (
SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
WHERE summary.sample = meta.sample
) SELECT
BOOL_AND(abs((geoms.total_pop - val) /
Coalesce(NullIf(total_pop, 0), 1)) = 0) is True no_bg_point_error
FROM geoms, results
WHERE cartodb_id = id
;

View File

@@ -360,9 +360,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.county', 'us.census.tiger.zcta5'])
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.county', 'us.census.tiger.zcta5']
AS _obs_geometryscores_5km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
@@ -390,60 +390,55 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
'us.census.tiger.census_tract', 'us.census.tiger.block_group']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.block_group']
AS _obs_geometryscores_2500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.block_group'])
WHERE table_id LIKE '%2015%';
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text
= '{ "us.census.tiger.block_group" : 9, "us.census.tiger.census_tract" : 3, "us.census.tiger.zcta5" : 0, "us.census.tiger.county" : 0 }'
AS _obs_geometryscores_numgeoms_500m_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500m_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 880, "us.census.tiger.census_tract" : 310, "us.census.tiger.zcta5" : 45, "us.census.tiger.county" : 1 }'
AS _obs_geometryscores_numgeoms_5km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_5km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 11531, "us.census.tiger.census_tract" : 3601, "us.census.tiger.zcta5" : 550, "us.census.tiger.county" : 14 }'
AS _obs_geometryscores_numgeoms_50km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_50km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 48917, "us.census.tiger.census_tract" : 15969, "us.census.tiger.zcta5" : 6534, "us.census.tiger.county" : 314 }'
AS _obs_geometryscores_numgeoms_500km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
'{ "us.census.tiger.block_group" : 169191, "us.census.tiger.census_tract" : 56469, "us.census.tiger.zcta5" : 26525, "us.census.tiger.county" : 2753 }'
AS _obs_geometryscores_numgeoms_2500km_buffer
SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_2500km_buffer
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'])
WHERE table_id LIKE '%2015%';
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC;
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
@@ -475,9 +470,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'], 2500)
WHERE table_id LIKE '%2015%';
SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county']
SELECT ARRAY_AGG(column_id ORDER BY score DESC)
= ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.county', 'us.census.tiger.zcta5']
AS _obs_geometryscores_500km_buffer_25000_geoms
FROM cdb_observatory._OBS_GetGeometryScores(
ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
@@ -485,6 +480,44 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
'us.census.tiger.zcta5', 'us.census.tiger.county'], 25000)
WHERE table_id LIKE '%2015%';
-- Check that one small geom approximates tract data
WITH geoms AS (SELECT cdb_observatory._testarea() the_geom),
summary AS (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) extent,
COUNT(*)::INT cnt,
SUM(ST_Area(the_geom))::Numeric sumarea
FROM geoms)
SELECT column_id = 'us.census.tiger.census_tract' testarea_uses_tract
FROM summary, LATERAL (
SELECT *
FROM cdb_observatory._OBS_GetGeometryScores(extent,
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'],
cnt, sumarea)) foo
ORDER BY score DESC LIMIT 1;
-- Check that randomly distributed points always use smallest geometry if we
-- order by numgeoms desc
WITH geoms as (SELECT UNNEST(ARRAY[
cdb_observatory._testpoint(),
st_translate(cdb_observatory._testpoint(), -0.003, 0),
st_translate(cdb_observatory._testpoint(), -0.006, 0)
]) the_geom),
summary as (SELECT
ST_SetSRID(ST_Extent(the_geom), 4326) extent,
SUM(ST_Area(the_geom))::Numeric area,
COUNT(*)::INTEGER cnt
FROM geoms
)
SELECT column_id = 'us.census.tiger.block_group' points_use_bg
FROM summary, LATERAL (
SELECT * FROM cdb_observatory._OBS_GetGeometryScores(
extent,
ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
'us.census.tiger.zcta5', 'us.census.tiger.county'],
cnt, area)) foo
WHERE table_id LIKE '%2015%'
ORDER BY numgeoms DESC LIMIT 1;
--
-- OBS_LegacyBuilderMetadata tests
--