release artifact

remove br subdistritos from testing
prefer geographpic precision over most recent timespan, handles issues emerging from inclusion of 1-year acs
2017-04-26 20:08:44 +00:00 · 2017-04-25 18:57:12 +00:00 · 2017-04-25 18:53:12 +00:00 · 2017-04-24 18:22:31 +00:00 · 2017-04-24 18:10:43 +00:00 · 2017-04-24 18:01:38 +00:00
17 changed files with 8218 additions and 964 deletions
--- a/NEWS.md
+++ b/NEWS.md
@@ -1,3 +1,40 @@
+1.5.0 (2017-04-24)
+
+__API Changes__
+
+* Add `suggested_name` to `OBS_GetMeta` responses
+  ([#281](https://github.com/CartoDB/observatory-extension/pull/281))
+* Add `geom_type`, `geom_extra`, and `geom_tags` to
+  `OBS_GetAvailableGeometries`.  This brings it up to spec with existing docs.
+  ([#282](https://github.com/CartoDB/observatory-extension/pull/282))
+* Add `timespan_type`, `timespan_extra`, and `timespan_tags` to
+  `OBS_GetAvailableTimespans` for consistency.
+  ([#282](https://github.com/CartoDB/observatory-extension/pull/282))
+
+1.4.0 (2017-03-21)
+
+__API Changes__
+
+* Allow for override of `target_area` and `target_geoms` in `OBS_GetMeta`
+  ([#276](https://github.com/CartoDB/observatory-extension/pull/276)).  This
+  allows the interface to work with points and sparse areas much btter.
+* Allow for override of `max_timespan_rank` and `max_score_rank` on an
+  item-by-item basis for metadata.
+* `numer_description`, `geom_description`, `denom_description`,
+  `numer_t_description`, `denom_t_description` and `geom_t_description` now
+  returned as part of `OBS_GetMeta`.
+
+__Improvements__
+
+* Reduced amount of simplification done on input geometries (from 0.0001 above
+  500 points to 0.00001 above 1000 points).
+* Added tests to confirm that accurate results are returned from automatic
+  boundary selection
+
+1.3.5 (2017-03-15)
+
+No changes.  Artifact to allow for data update.
+
 1.3.4 (2017-03-10)

 __Bugfixes__
--- a/doc/boundary_functions.md
+++ b/doc/boundary_functions.md
@@ -4,7 +4,7 @@ Use the following functions to retrieve [Boundary](https://carto.com/docs/carto-

 You can [access](https://carto.com/docs/carto-engine/data/accessing) boundaries through CARTO Builder. The same methods will work if you are using the CARTO Engine to develop your application. We [encourage you](http://docs/carto-engine/data/accessing/#best-practices) to use table modifying methods (UPDATE and INSERT) over dynamic methods (SELECT).

-## OBS_GetBoundariesByGeometry(polygon geometry, geometry_id text)
+## OBS_GetBoundariesByGeometry(geom geometry, geometry_id text)

 The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a set of boundary geometries that intersect a supplied geometry. This can be used to find all boundaries that are within or overlap a bounding box. You have the ability to choose whether to retrieve all boundaries that intersect your supplied bounding box or only those that fall entirely inside of your bounding box.

@@ -12,7 +12,7 @@ The ```OBS_GetBoundariesByGeometry(geometry, geometry_id)``` method returns a se

 Name |Description
 --- | ---
-polygon | a bounding box or other WGS84 geometry
+geom | a WGS84 geometry
 geometry_id | a string identifier for a boundary geometry
 timespan (optional) | year(s) to request from ('NULL' (default) gives most recent)
 overlap_type (optional) | one of '[intersects](http://postgis.net/docs/manual-2.2/ST_Intersects.html)' (default), '[contains](http://postgis.net/docs/manual-2.2/ST_Contains.html)', or '[within](http://postgis.net/docs/manual-2.2/ST_Within.html)'.
@@ -26,7 +26,7 @@ Column Name | Description
 the_geom | a boundary geometry (e.g., US Census tract boundaries)
 geom_refs | a string identifier for the geometry (e.g., geoids of US Census tracts)

-If geometries are not found for the requested `polygon`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.
+If geometries are not found for the requested `geom`, `geometry_id`, `timespan`, or `overlap_type`, then null values are returned.

 #### Example

@@ -44,7 +44,6 @@ FROM OBS_GetBoundariesByGeometry(

 #### Errors

-* If a geometry other than a point is passed as the first argument, an error is thrown: `Invalid geometry type (ST_Polygon), expecting 'ST_Point'`
 * If an `overlap_type` other than the valid ones listed above is entered, then an error is thrown

 ## OBS_GetPointsByGeometry(polygon geometry, geometry_id text)
--- a/doc/discovery_functions.md
+++ b/doc/discovery_functions.md
@@ -327,9 +327,12 @@ timespan_id | Text | The ID of the timespan
 timespan_name | Text | A human readable name for the timespan
 timespan_description | Text | Ignored
 timespan_weight | Numeric | Ignored
+timespan_aggregate | Text | Ignored
 timespan_license | Text | Ignored
 timespan_source | Text | Ignored
-timespan_aggregate | Text | Ignored
+timespan_type | Text | Ignored
+timespan_extra | JSONB | Ignored
+timespan_tags | JSONB | Ignored
 valid_numer | Boolean | True if the `numer_id` argument is a valid numerator for this timespan, False otherwise
 valid_denom | Boolean | True if the `timespan` argument is a valid timespan for this timespan, False otherwise
 valid_geom | Boolean | True if the `geom_id` argument is a valid geometry for this timespan, False otherwise
--- a/doc/measures_functions.md
+++ b/doc/measures_functions.md
@@ -196,7 +196,7 @@ UPDATE tablename
 SET segmentation = OBS_GetCategory(the_geom, 'us.census.spielman_singleton_segments.X55')
 ```

-## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_boundary_score_rank, num_target_geoms)
+## OBS_GetMeta(extent geometry, metadata json, max_timespan_rank, max_score_rank, target_geoms)

 The ```OBS_GetMeta(extent, metadata)``` function returns a completed Data
 Observatory metadata JSON Object for use in ```OBS_GetData(geomvals,
@@ -213,9 +213,9 @@ Name | Description
 ---- | -----------
 extent | A geometry of the extent of the input geometries
 metadata | A JSON array composed of metadata input objects.  Each indicates one desired measure for an output column, and optionally additional parameters about that column
-max_timespan_rank | How many historical time periods to include.  Defaults to 1
-max_boundary_score_rank | How many alternative boundary levels to include.  Defaults to 1
-num_target_geoms | Target number of geometries.  Boundaries with close to this many objects within `extent` will be ranked highest. 
+num_timespan_options | How many historical time periods to include.  Defaults to 1
+num_score_options | How many alternative boundary levels to include.  Defaults to 1
+target_geoms | Target number of geometries.  Boundaries with close to this many objects within `extent` will be ranked highest. 

 The schema of the metadata input objects are as follows:

@@ -227,6 +227,10 @@ normalization | The desired normalization.  One of 'area', 'prenormalized', or '
 denom_id | Identifier for a desired normalization column in case `normalization` is 'denominated'.  Will be automatically assigned if necessary.  Ignored if this metadata object specifies a geometry.
 numer_timespan | The desired timespan for the measurement.  Defaults to most recent timespan available if left unspecified.
 geom_timespan | The desired timespan for the geometry.  Defaults to timespan matching numer_timespan if left unspecified.
+target_area | Instead of aiming to have `target_geoms` in the area of the geometry passed as `extent`, fill this area.  Unit is square degrees WGS84.  Set this to `0` if you want to use the smallest source geometry for this element of metadata, for example if you're passing in points.
+target_geoms | Override global `target_geoms` for this element of metadata
+max_timespan_rank | Only include timespans of this recency (for example, `1` is only the most recent timespan). No limit by default
+max_score_rank | Only include boundaries of this relevance (for example, `1` is the most relevant boundary).  Is `1` by default

 #### Returns

@@ -242,9 +246,12 @@ fail.

 Metadata Output Key | Description
 --- | -----------
+suggested_name | A suggested column name for adding this to an existing table
 numer_id | Identifier for desired measurement
 numer_timespan | Timespan that will be used of the desired measurement
 numer_name | Human-readable name of desired measure
+numer_description | Long human-readable description of the desired measure
+numer_t_description | Further information about the source table
 numer_type | PostgreSQL/PostGIS type of desired measure
 numer_colname | Internal identifier for column name
 numer_tablename | Internal identifier for table
@@ -252,6 +259,8 @@ numer_geomref_colname | Internal identifier for geomref column name
 denom_id | Identifier for desired normalization
 denom_timespan | Timespan that will be used of the desired normalization
 denom_name | Human-readable name of desired measure's normalization
+denom_description | Long human-readable description of the desired measure's normalization
+denom_t_description | Further information about the source table
 denom_type | PostgreSQL/PostGIS type of desired measure's normalization
 denom_colname | Internal identifier for normalization column name
 denom_tablename | Internal identifier for normalization table
@@ -259,12 +268,14 @@ denom_geomref_colname | Internal identifier for normalization geomref column nam
 geom_id | Identifier for desired boundary geometry
 geom_timespan | Timespan that will be used of the desired boundary geometry
 geom_name | Human-readable name of desired boundary geometry
+geom_description | Long human-readable description of the desired boundary geometry
+geom_t_description | Further information about the source table
 geom_type | PostgreSQL/PostGIS type of desired boundary geometry
 geom_colname | Internal identifier for boundary geometry column name
 geom_tablename | Internal identifier for boundary geometry table
 geom_geomref_colname | Internal identifier for boundary geometry ref column name
 timespan_rank | Ranking of this measurement by time, most recent is 1, second most recent 2, etc.
-score | The score of this measurement's boundary compared to the `extent` and `num_target_geoms` passed in.  Between 0 and 100.
+score | The score of this measurement's boundary compared to the `extent` and `target_geoms` passed in.  Between 0 and 100.
 score_rank | The ranking of this measurement's boundary, highest ranked is 1, second is 2, etc.
 numer_aggregate | The aggregate type of the numerator, either `sum`, `average`, `median`, or blank
 denom_aggregate | The aggregate type of the denominator, either `sum`, `average`, `median`, or blank
--- a/release/observatory--1.3.5.sql
+++ b/release/observatory--1.3.5.sql
--- a/release/observatory--1.4.0.sql
+++ b/release/observatory--1.4.0.sql
--- a/release/observatory--1.5.0.sql
+++ b/release/observatory--1.5.0.sql
--- a/release/observatory.control
+++ b/release/observatory.control
@@ -1,5 +1,5 @@
 comment = 'CartoDB Observatory backend extension'
-default_version = '1.3.4'
+default_version = '1.5.0'
 requires = 'postgis'
 superuser = true
 schema = cdb_observatory
--- a/src/pg/observatory.control
+++ b/src/pg/observatory.control
@@ -1,5 +1,5 @@
 comment = 'CartoDB Observatory backend extension'
-default_version = '1.3.4'
+default_version = '1.5.0'
 requires = 'postgis'
 superuser = true
 schema = cdb_observatory
--- a/src/pg/sql/41_observatory_augmentation.sql
+++ b/src/pg/sql/41_observatory_augmentation.sql
@@ -102,8 +102,8 @@ $$ LANGUAGE plpgsql STABLE;
 CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetMeta(
  geom geometry(Geometry, 4326),
  params JSON,
-  max_timespan_rank INTEGER DEFAULT NULL, -- cutoff for timespan ranks when there's ambiguity
-  max_score_rank INTEGER DEFAULT NULL, -- cutoff for geom ranks when there's ambiguity
+  num_timespan_options INTEGER DEFAULT NULL, -- how many timespan options to show
+  num_score_options INTEGER DEFAULT NULL, -- how many score options to show
  target_geoms INTEGER DEFAULT NULL
 )
 RETURNS JSON
@@ -115,20 +115,34 @@ DECLARE
  scores_clause TEXT;
  result JSON;
 BEGIN
-  IF max_timespan_rank IS NULL THEN
-    max_timespan_rank := 1;
+  IF num_timespan_options IS NULL THEN
+    num_timespan_options := 1;
  END IF;
-  IF max_score_rank IS NULL THEN
-    max_score_rank := 1;
+  IF num_score_options IS NULL THEN
+    num_score_options := 1;
  END IF;

  numer_filters := (SELECT Array_Agg(val) FILTER (WHERE val IS NOT NULL) FROM (SELECT (JSON_Array_Elements(params))->>'numer_id' val) foo);
  geom_filters := (SELECT Array_Agg(val) FILTER (WHERE val IS NOT NULL) FROM (SELECT (JSON_Array_Elements(params))->>'geom_id' val) bar);
  meta_filter_clause := '(m.numer_id = ANY ($6) OR m.geom_id = ANY ($7))';

-  scores_clause := 'SELECT *
-                    FROM cdb_observatory._OBS_GetGeometryScores($1,
-                    (SELECT Array_Agg(geom_id) FROM meta), $2) scores ';
+  scores_clause := ' agg_geoms AS (
+    SELECT target_geoms, target_area, ARRAY_AGG(geom_id) geom_ids
+    FROM meta
+    GROUP BY target_geoms, target_area
+  ), scores AS (
+    SELECT target_geoms, target_area,
+      CASE target_area
+      -- point-specific, just order by numgeoms instead of score
+      WHEN 0 THEN scores.numgeoms
+      -- has some area, use proper scoring
+      ELSE scores.score
+      END AS score,
+           scores.numgeoms, scores.table_id, scores.column_id
+    FROM agg_geoms,
+         LATERAL cdb_observatory._OBS_GetGeometryScores($1,
+            geom_ids, COALESCE(target_geoms, $2), target_area) scores
+  ) ';

  IF JSON_Array_Length(params) = 1 THEN
    IF numer_filters IS NULL AND geom_filters IS NOT NULL THEN
@@ -142,9 +156,11 @@ BEGIN
    END IF;

    IF geom_filters IS NOT NULL AND numer_filters IS NOT NULL THEN
-      scores_clause := 'SELECT 1 score, null, geom_tid table_id, geom_id column_id,
-                               null, null, null, null, null, null
-                        FROM meta ';
+      scores_clause := 'scores AS (
+        SELECT NULL::INTEGER target_geoms, NULL::Numeric target_area,
+        1 score, null, geom_tid table_id, geom_id column_id,
+        NULL::Integer numgeoms
+        FROM meta) ';
    END IF;
  END IF;

@@ -156,16 +172,30 @@ BEGIN
        (unnest($3))->>'geom_id' geom_id,
        (unnest($3))->>'numer_timespan' numer_timespan,
        (unnest($3))->>'geom_timespan' geom_timespan,
-        (unnest($3))->>'normalization' normalization
+        (unnest($3))->>'normalization' normalization,
+        (unnest($3))->>'max_timespan_rank' max_timespan_rank,
+        (unnest($3))->>'max_score_rank' max_score_rank,
+        ((unnest($3))->>'target_geoms')::INTEGER target_geoms,
+        ((unnest($3))->>'target_area')::Numeric target_area
    ), meta AS (SELECT
        id,
        f.numer_id,
+        LOWER(TRIM(BOTH '_' FROM regexp_replace(CASE WHEN f.numer_id IS NOT NULL
+          THEN CASE
+            WHEN normalization ILIKE 'area%%' THEN numer_colname || ' per sq km'
+            WHEN normalization ILIKE 'denom%%' THEN numer_colname || ' rate'
+            ELSE numer_colname
+          END || ' ' || m.numer_timespan
+          ELSE geom_name || ' ' || m.geom_timespan
+        END, '[^a-zA-Z0-9]+', '_', 'g'))) suggested_name,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_aggregate END numer_aggregate,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_colname END numer_colname,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_geomref_colname END numer_geomref_colname,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_tablename END numer_tablename,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_type END numer_type,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_name END numer_name,
+        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_description END numer_description,
+        CASE WHEN f.numer_id IS NULL THEN NULL ELSE numer_t_description END numer_t_description,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.numer_timespan END numer_timespan,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE m.denom_id END denom_id,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_aggregate END denom_aggregate,
@@ -173,6 +203,8 @@ BEGIN
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_geomref_colname END denom_geomref_colname,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_tablename END denom_tablename,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_name END denom_name,
+        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_description END denom_description,
+        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_t_description END denom_t_description,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_type END denom_type,
        CASE WHEN f.numer_id IS NULL THEN NULL ELSE denom_reltype END denom_reltype,
        m.geom_id,
@@ -182,8 +214,14 @@ BEGIN
        geom_geomref_colname,
        geom_tablename,
        geom_name,
+        geom_description,
+        geom_t_description,
        geom_type,
-        normalization
+        normalization,
+        max_timespan_rank,
+        max_score_rank,
+        target_geoms,
+        target_area
      FROM observatory.obs_meta m JOIN _filters f
      ON CASE WHEN f.numer_id IS NULL THEN m.geom_id ELSE m.numer_id END =
         CASE WHEN f.numer_id IS NULL THEN f.geom_id ELSE f.numer_id END
@@ -194,9 +232,8 @@ BEGIN
        AND (m.geom_id = f.geom_id OR COALESCE(f.geom_id, '') = '')
        AND (m.geom_timespan = f.geom_timespan OR COALESCE(f.geom_timespan, '') = '')
        AND (m.numer_timespan = f.numer_timespan OR COALESCE(f.numer_timespan, '') = '')
-    ), scores AS (
-        %s
-    ), groups AS (SELECT
+    ), %s
+    , groups AS (SELECT
        id,
        scores.score,
        numer_timespan,
@@ -207,45 +244,59 @@ BEGIN
          'numer_id', numer_id,
          'timespan_rank', dense_rank() OVER (PARTITION BY id ORDER BY numer_timespan DESC),
          'score_rank', dense_rank() OVER (PARTITION BY id ORDER BY score DESC),
+          'timespan_rownum', row_number() over
+            (PARTITION BY id, score ORDER BY numer_timespan DESC, Coalesce(denom_id, '')),
+          'score_rownum', row_number() over
+            (PARTITION BY id, numer_timespan ORDER BY score DESC, Coalesce(denom_id, '')),
          'score', scores.score,
+          'suggested_name', cdb_observatory.FIRST(meta.suggested_name),
          'numer_aggregate', cdb_observatory.FIRST(meta.numer_aggregate),
          'numer_colname', cdb_observatory.FIRST(meta.numer_colname),
          'numer_geomref_colname', cdb_observatory.FIRST(meta.numer_geomref_colname),
          'numer_tablename', cdb_observatory.FIRST(meta.numer_tablename),
          'numer_type', cdb_observatory.FIRST(meta.numer_type),
-          --'numer_description', cdb_observatory.FIRST(meta.numer_description),
-          --'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
+          'numer_description', cdb_observatory.FIRST(meta.numer_description),
+          'numer_t_description', cdb_observatory.FIRST(meta.numer_t_description),
          'denom_aggregate', cdb_observatory.FIRST(meta.denom_aggregate),
          'denom_colname', cdb_observatory.FIRST(denom_colname),
          'denom_geomref_colname', cdb_observatory.FIRST(denom_geomref_colname),
          'denom_tablename', cdb_observatory.FIRST(denom_tablename),
          'denom_type', cdb_observatory.FIRST(meta.denom_type),
          'denom_reltype', cdb_observatory.FIRST(meta.denom_reltype),
-          --'denom_description', cdb_observatory.FIRST(meta.denom_description),
-          --'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
+          'denom_description', cdb_observatory.FIRST(meta.denom_description),
+          'denom_t_description', cdb_observatory.FIRST(meta.denom_t_description),
          'geom_colname', cdb_observatory.FIRST(geom_colname),
          'geom_geomref_colname', cdb_observatory.FIRST(geom_geomref_colname),
          'geom_tablename', cdb_observatory.FIRST(geom_tablename),
          'geom_type', cdb_observatory.FIRST(meta.geom_type),
          'geom_timespan', cdb_observatory.FIRST(meta.geom_timespan),
-          --'geom_description', cdb_observatory.FIRST(meta.geom_description),
-          --'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
+          'geom_description', cdb_observatory.FIRST(meta.geom_description),
+          'geom_t_description', cdb_observatory.FIRST(meta.geom_t_description),
          'numer_timespan', cdb_observatory.FIRST(numer_timespan),
          'numer_name', cdb_observatory.FIRST(numer_name),
          'denom_name', cdb_observatory.FIRST(denom_name),
          'geom_name', cdb_observatory.FIRST(geom_name),
          'normalization', cdb_observatory.FIRST(normalization),
+          'max_timespan_rank', cdb_observatory.FIRST(max_timespan_rank),
+          'max_score_rank', cdb_observatory.FIRST(max_score_rank),
+          'target_geoms', cdb_observatory.FIRST(scores.target_geoms),
+          'target_area', cdb_observatory.FIRST(scores.target_area),
+          'num_geoms', cdb_observatory.FIRST(scores.numgeoms),
          'denom_id', denom_id,
          'geom_id', meta.geom_id
        ) metadata
      FROM meta, scores
      WHERE meta.geom_id = scores.column_id
        AND meta.geom_tid = scores.table_id
+        AND COALESCE(meta.target_geoms, 0) = COALESCE(scores.target_geoms, 0)
+        AND COALESCE(meta.target_area, 0) = COALESCE(scores.target_area, 0)
      GROUP BY id, score, numer_id, denom_id, geom_id, numer_timespan
    ) SELECT JSON_AGG(metadata ORDER BY id)
      FROM groups
-      WHERE timespan_rank <= $4
-        AND score_rank <= $5
+      WHERE timespan_rank <= Coalesce((metadata->>'max_timespan_rank')::INTEGER, 'infinity'::FLOAT)
+        AND score_rank <= Coalesce((metadata->>'max_score_rank')::INTEGER, 1)
+        AND (metadata->>'timespan_rownum')::INTEGER <= $4
+        AND (metadata->>'score_rownum')::INTEGER <= $5
  $string$, meta_filter_clause, scores_clause)
  INTO result
  USING
@@ -255,8 +306,8 @@ BEGIN
    END,
    target_geoms,
    (SELECT ARRAY(SELECT json_array_elements_text(params))::json[]),
-    max_timespan_rank,
-    max_score_rank, numer_filters, geom_filters
+    num_timespan_options,
+    num_score_options, numer_filters, geom_filters
    ;
  RETURN result;
 END;
@@ -772,8 +823,8 @@ BEGIN
    RETURN QUERY EXECUTE format($query$
      WITH _raw_geoms AS (%s),
      _geoms AS (SELECT id,
-        CASE WHEN (ST_NPoints(geom) > 500)
-               THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.0001)), 3)
+        CASE WHEN (ST_NPoints(geom) > 1000)
+               THEN ST_CollectionExtract(ST_MakeValid(ST_SimplifyVW(geom, 0.00001)), 3)
             ELSE geom END geom
        FROM _raw_geoms),
      _procgeoms AS (SELECT _geoms.id, _geoms.geom %s %s
--- a/src/pg/sql/42_observatory_exploration.sql
+++ b/src/pg/sql/42_observatory_exploration.sql
@@ -252,6 +252,9 @@ CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableGeometries(
  geom_aggregate TEXT,
  geom_license TEXT,
  geom_source TEXT,
+  geom_type TEXT,
+  geom_extra JSONB,
+  geom_tags JSONB,
  valid_numer BOOLEAN,
  valid_denom BOOLEAN,
  valid_timespan BOOLEAN,
@@ -286,6 +289,9 @@ BEGIN
             NULL::TEXT geom_aggregate,
             NULL::TEXT license,
             NULL::TEXT source,
+             geom_type::TEXT,
+             geom_extra::JSONB,
+             geom_tags::JSONB,
      $1 = ANY(numers) valid_numer,
      $2 = ANY(denoms) valid_denom,
      $3 = ANY(timespans) valid_timespan
@@ -319,6 +325,9 @@ CREATE OR REPLACE FUNCTION cdb_observatory.OBS_GetAvailableTimespans(
  timespan_aggregate TEXT,
  timespan_license TEXT,
  timespan_source TEXT,
+  timespan_type TEXT,
+  timespan_extra JSONB,
+  timespan_tags JSONB,
  valid_numer BOOLEAN,
  valid_denom BOOLEAN,
  valid_geom BOOLEAN
@@ -343,8 +352,11 @@ BEGIN
           timespan_description::TEXT,
           timespan_weight::NUMERIC,
           NULL::TEXT timespan_aggregate,
-           NULL::TEXT license,
-           NULL::TEXT source,
+           NULL::TEXT timespan_license,
+           NULL::TEXT timespan_source,
+           NULL::TEXT timespan_type,
+           NULL::JSONB timespan_extra,
+           NULL::JSONB timespan_tags,
    $1 = ANY(numers) valid_numer,
    $2 = ANY(denoms) valid_denom,
    $3 = ANY(geoms) valid_geom_id
@@ -418,7 +430,8 @@ $$ LANGUAGE plpgsql;
 CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
  bounds Geometry(Geometry, 4326) DEFAULT NULL,
  filter_geom_ids TEXT[] DEFAULT NULL,
-  desired_num_geoms INTEGER DEFAULT NULL
+  desired_num_geoms INTEGER DEFAULT NULL,
+  desired_area NUMERIC DEFAULT NULL
 ) RETURNS TABLE (
  score NUMERIC,
  numtiles BIGINT,
@@ -430,6 +443,8 @@ CREATE OR REPLACE FUNCTION cdb_observatory._OBS_GetGeometryScores(
  estnumgeoms NUMERIC,
  meanmediansize NUMERIC
 ) AS $$
+DECLARE
+  num_geoms_multiplier Numeric;
 BEGIN
  IF desired_num_geoms IS NULL THEN
    desired_num_geoms := 3000;
@@ -440,6 +455,18 @@ BEGIN
  IF ST_Npoints(bounds) > 10000 THEN
    bounds := ST_Envelope(bounds);
  END IF;
+  IF desired_area IS NULL THEN
+    desired_area := ST_Area(bounds);
+  END IF;
+
+  -- In case of points, desired_area will be 0.  We still want an accurate
+  -- estimate of numgeoms in that case.
+  IF desired_area = 0 THEN
+    num_geoms_multiplier := 1;
+  ELSE
+    num_geoms_multiplier := Coalesce(desired_area / Nullif(ST_Area(bounds), 0), 1);
+  END IF;
+
  RETURN QUERY
  EXECUTE $string$
    WITH clipped_geom AS (
@@ -453,13 +480,11 @@ BEGIN
    ), clipped_geom_countagg AS (
      SELECT column_id, table_id
        , BOOL_AND(ST_BandIsNoData(clipped_tile, 1)) nodata
-        , ST_CountAgg(clipped_tile, 1, False)::Numeric pixels -- -10
      FROM clipped_geom
      GROUP BY column_id, table_id
    ), clipped_geom_reagg AS (
      SELECT COUNT(*)::BIGINT cnt, a.column_id, a.table_id,
             cdb_observatory.FIRST(nodata) first_nodata,
-             cdb_observatory.FIRST(pixels) first_pixel,
             cdb_observatory.FIRST(tile) first_tile,
             (ST_SummaryStatsAgg(clipped_tile, 1, False)).sum::Numeric sum_geoms, -- ND
             (ST_SummaryStatsAgg(clipped_tile, 2, False)).mean::Numeric / 255 mean_fill --ND
@@ -474,9 +499,8 @@ BEGIN
        , (CASE WHEN first_nodata IS FALSE
                THEN sum_geoms
                ELSE COALESCE(ST_Value(first_tile, 1, ST_PointOnSurface($1)), 0)
-                  * (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0))
-                    * first_pixel) -- -20
-          END)::Numeric
+                  * (ST_Area($1) / ST_Area(ST_PixelAsPolygon(first_tile, 0, 0)))
+          END)::Numeric * $4
        AS numgeoms
        , (CASE WHEN first_nodata IS FALSE
                THEN mean_fill
@@ -490,7 +514,7 @@ BEGIN
      ((100.0 / (1+abs(log(0.0001 + $3) - log(0.0001 + numgeoms::Numeric)))) * percentfill)::Numeric
      AS score, *
      FROM final
-  $string$ USING bounds, filter_geom_ids, desired_num_geoms;
+  $string$ USING bounds, filter_geom_ids, desired_num_geoms, num_geoms_multiplier;
  RETURN;
 END
 $$ LANGUAGE plpgsql IMMUTABLE;
--- a/src/pg/test/expected/41_observatory_augmentation_test.out
+++ b/src/pg/test/expected/41_observatory_augmentation_test.out
@@ -150,6 +150,15 @@ t|t|t|t|t|t|t|t|t|t|t|t|t|t|t
 obs_getmeta_conflicting_metadata
 t
 (1 row)
+obs_getmeta_suggested_name
+t
+(1 row)
+obs_getmeta_suggested_name_area
+t
+(1 row)
+obs_getmeta_suggested_name_denom
+t
+(1 row)
 obs_getdata_geomval_empty_null
 t
 (1 row)
@@ -261,3 +270,31 @@ t|t
 ary_type|obs_getdata_api_geomrefs_args_string_return
 t|t
 (1 row)
+setseed
+
+(1 row)
+bg_sample|bg_max_error|bg_avg_error|bg_min_error
+1|t|t|t
+2|t|t|t
+3|t|t|t
+5|t|t|t
+10|t|t|t
+25|t|t|t
+50|t|t|t
+100|t|t|t
+2085|t|t|t
+(9 rows)
+tract_sample|tract_max_error|tract_avg_error|tract_min_error
+1|t|t|t
+2|t|t|t
+3|t|t|t
+5|t|t|t
+10|t|t|t
+25|t|t|t
+50|t|t|t
+100|t|t|t
+761|t|t|t
+(9 rows)
+no_bg_point_error
+t
+(1 row)
--- a/src/pg/test/expected/42_observatory_exploration_test.out
+++ b/src/pg/test/expected/42_observatory_exploration_test.out
@@ -120,6 +120,9 @@ t
 _obs_getavailablegeometries_bg_not_1996
 t
 (1 row)
+_obs_getavailablegeometries_has_boundary_tag
+t
+(1 row)
 _obs_getavailabletimespans_2010_2014_in_all
 t
 (1 row)
@@ -159,21 +162,36 @@ t
 _obs_geometryscores_2500km_buffer
 t
 (1 row)
-_obs_geometryscores_numgeoms_500m_buffer
-t
-(1 row)
-_obs_geometryscores_numgeoms_5km_buffer
-t
-(1 row)
-_obs_geometryscores_numgeoms_50km_buffer
-t
-(1 row)
-_obs_geometryscores_numgeoms_500km_buffer
-t
-(1 row)
-_obs_geometryscores_numgeoms_2500km_buffer
-t
-(1 row)
+column_id|_obs_geometryscores_numgeoms_500m_buffer
+us.census.tiger.block_group|2
+us.census.tiger.census_tract|1
+us.census.tiger.zcta5|0
+us.census.tiger.county|0
+(4 rows)
+column_id|_obs_geometryscores_numgeoms_5km_buffer
+us.census.tiger.block_group|244
+us.census.tiger.census_tract|78
+us.census.tiger.zcta5|9
+us.census.tiger.county|0
+(4 rows)
+column_id|_obs_geometryscores_numgeoms_50km_buffer
+us.census.tiger.block_group|10817
+us.census.tiger.census_tract|3396
+us.census.tiger.zcta5|484
+us.census.tiger.county|11
+(4 rows)
+column_id|_obs_geometryscores_numgeoms_500km_buffer
+us.census.tiger.block_group|48567
+us.census.tiger.census_tract|15823
+us.census.tiger.zcta5|6466
+us.census.tiger.county|295
+(4 rows)
+column_id|_obs_geometryscores_numgeoms_2500km_buffer
+us.census.tiger.block_group|165852
+us.census.tiger.census_tract|55283
+us.census.tiger.zcta5|27046
+us.census.tiger.county|2551
+(4 rows)
 _obs_geometryscores_500km_buffer_50_geoms
 t
 (1 row)
@@ -186,6 +204,12 @@ t
 _obs_geometryscores_500km_buffer_25000_geoms
 t
 (1 row)
+testarea_uses_tract
+t
+(1 row)
+points_use_bg
+t
+(1 row)
 _total_pop_in_legacy_builder_metadata
 t
 (1 row)
--- a/src/pg/test/fixtures/load_fixtures.sql
+++ b/src/pg/test/fixtures/load_fixtures.sql
--- a/src/pg/test/sql/41_observatory_augmentation_test.sql
+++ b/src/pg/test/sql/41_observatory_augmentation_test.sql
@@ -336,7 +336,7 @@ FROM meta;
 -- OBS_GetMeta for point completes several partial measures with "best"
 -- metadata, includes geom alternatives if asked
 WITH meta AS (SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
-  '[{"numer_id": "us.census.acs.B01001002"}]', null, 2) meta)
+  '[{"numer_id": "us.census.acs.B01001002", "max_score_rank": 2}]', null, 2) meta)
 SELECT
 (meta->0->>'id')::integer = 1 id,
 (meta->0->>'numer_id') = 'us.census.acs.B01001002' numer_id,
@@ -398,6 +398,21 @@ SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
  '[{"numer_id": "us.census.acs.B01001002", "denom_id": "us.census.acs.B01001002", "geom_id": "us.census.tiger.census_tract"}]') IS NULL
 AS obs_getmeta_conflicting_metadata;

+-- OBS_GetMeta provides suggested name for simple meta request
+SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
+  '[{"numer_id": "us.census.acs.B01003001"}]'
+)->0->>'suggested_name' = 'total_pop_2010_2014' obs_getmeta_suggested_name;
+
+-- OBS_GetMeta provides suggested name for simple meta request with area norm
+SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
+  '[{"numer_id": "us.census.acs.B01003001", "normalization": "area"}]'
+)->0->>'suggested_name' = 'total_pop_per_sq_km_2010_2014' obs_getmeta_suggested_name_area;
+
+-- OBS_GetMeta provides suggested name for simple meta request with denom
+SELECT cdb_observatory.OBS_GetMeta(cdb_observatory._TestPoint(),
+  '[{"numer_id": "us.census.acs.B01001002", "normalization": "denom"}]'
+)->0->>'suggested_name' = 'male_pop_rate_2010_2014' obs_getmeta_suggested_name_denom;
+
 -- OBS_GetData/OBS_GetMeta by id with empty list/null
 WITH data AS (SELECT * FROM cdb_observatory.OBS_GetData(ARRAY[]::TEXT[], null))
 SELECT ARRAY_AGG(data) IS NULL AS obs_getdata_geomval_empty_null FROM data;
@@ -798,3 +813,146 @@ SELECT json_typeof(data->0->'value') = 'array' ary_type,
 AS OBS_GetData_API_geomrefs_args_string_return
 FROM cdb_observatory.obs_getdata(array['36047'],
      '[{"numer_type": "text", "numer_colname": "obs_getboundarybyid", "api_method": "obs_getboundarybyid", "api_args": ["us.census.tiger.county"]}]');
+
+-- Ensure consistent results below.
+select setseed(0);
+
+-- Check that random assortment of block groups in Brooklyn return accurate data
+WITH _geoms AS (
+  SELECT
+    (data->0->>'value')::geometry the_geom,
+    data->0->>'geomref' geom_ref,
+    (data->1->>'value')::numeric total_pop
+  FROM cdb_observatory.OBS_GetData(
+    array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
+    (SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
+      '[{"geom_id": "us.census.tiger.block_group"},
+        {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
+    FALSE
+  )
+  WHERE data->0->>'geomref' LIKE '36047%'
+  ORDER BY RANDOM()
+), geoms AS (
+  SELECT *, row_number() OVER () cartodb_id FROM _geoms
+), samples AS (
+  SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
+), filtered AS (
+  SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
+), summary AS (
+  SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
+    COUNT(*)::INT cnt,
+    ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
+    SUM(ST_Area(the_geom))::Numeric sumarea
+  FROM filtered
+  GROUP BY sample
+), meta AS (
+  SELECT sample, cdb_observatory.OBS_GetMeta(extent,
+    ('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
+    1, 1, cnt) meta
+  FROM summary
+  GROUP BY sample, extent, cnt, sumarea
+), results AS (
+  SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
+  FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
+  WHERE summary.sample = meta.sample
+) SELECT sample bg_sample
+ , MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_max_error
+ , AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_avg_error
+ , MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 bg_min_error
+FROM geoms, results
+WHERE cartodb_id = id
+GROUP BY sample
+ORDER BY sample
+;
+
+-- Check that random assortment of tracts in Brooklyn return accurate data
+WITH _geoms AS (
+  SELECT
+    (data->0->>'value')::geometry the_geom,
+    data->0->>'geomref' geom_ref,
+    (data->1->>'value')::numeric total_pop
+  FROM cdb_observatory.OBS_GetData(
+    array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
+    (SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
+      '[{"geom_id": "us.census.tiger.census_tract"},
+        {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.census_tract", "normalization": "predenom"}]')),
+    FALSE
+  )
+  WHERE data->0->>'geomref' LIKE '36047%'
+  ORDER BY RANDOM()
+), geoms AS (
+  SELECT *, row_number() OVER () cartodb_id FROM _geoms
+), samples AS (
+  SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
+), filtered AS (
+  SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
+), summary AS (
+  SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
+    COUNT(*)::INT cnt,
+    ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
+    SUM(ST_Area(the_geom))::Numeric sumarea
+  FROM filtered
+  GROUP BY sample
+), meta AS (
+  SELECT sample, cdb_observatory.OBS_GetMeta(extent,
+    ('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
+    1, 1, cnt) meta
+  FROM summary
+  GROUP BY sample, extent, cnt, sumarea
+), results AS (
+  SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
+  FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
+  WHERE summary.sample = meta.sample
+) SELECT sample tract_sample
+ , MAX(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_max_error
+ , AVG(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_avg_error
+ , MIN(100 * abs((geoms.total_pop - val) / Coalesce(NullIf(total_pop, 0), NULL)))::Numeric(10, 2) < 10 tract_min_error
+FROM geoms, results
+WHERE cartodb_id = id
+GROUP BY sample
+ORDER BY sample
+;
+
+-- Check that random assortment of block group points in Brooklyn return accurate data
+WITH _geoms AS (
+  SELECT
+    ST_PointOnSurface((data->0->>'value')::geometry) the_geom,
+    data->0->>'geomref' geom_ref,
+    (data->1->>'value')::numeric total_pop
+  FROM cdb_observatory.OBS_GetData(
+    array[(st_buffer(cdb_observatory._testpoint(), 0.2), 1)::geomval],
+    (SELECT cdb_observatory.OBS_GetMeta(ST_MakeEnvelope(-179, 89, 179, -89, 4326),
+      '[{"geom_id": "us.census.tiger.block_group"},
+        {"numer_id": "us.census.acs.B01003001", "geom_id": "us.census.tiger.block_group", "normalization": "predenom"}]')),
+    FALSE
+  )
+  WHERE data->0->>'geomref' LIKE '36047%'
+), geoms AS (
+  SELECT *, row_number() OVER () cartodb_id FROM _geoms
+), samples AS (
+  SELECT COUNT(*) cnt, unnest(ARRAY[1, 2, 3, 5, 10, 25, 50, 100, COUNT(*)]) sample FROM geoms
+), filtered AS (
+  SELECT * FROM geoms, samples WHERE cartodb_id % (cnt / sample) = 0
+), summary AS (
+  SELECT sample, ST_SetSRID(ST_Extent(the_geom), 4326) extent,
+    COUNT(*)::INT cnt,
+    ARRAY_AGG((the_geom, cartodb_id)::geomval) geomvals,
+    SUM(ST_Area(the_geom))::Numeric sumarea
+  FROM filtered
+  GROUP BY sample
+), meta AS (
+  SELECT sample, cdb_observatory.OBS_GetMeta(extent,
+    ('[{"numer_id": "us.census.acs.B01003001", "normalization": "predenom", "target_area": ' || sumarea || '}]')::JSON,
+    1, 1, cnt) meta
+  FROM summary
+  GROUP BY sample, extent, cnt, sumarea
+), results AS (
+  SELECT summary.sample, id, meta->0->>'geom_id' geom_id, (data->0->>'value')::Numeric as val
+  FROM summary, meta, LATERAL cdb_observatory.OBS_GetData(geomvals, meta) data
+  WHERE summary.sample = meta.sample
+) SELECT
+ BOOL_AND(abs((geoms.total_pop - val) /
+      Coalesce(NullIf(total_pop, 0), 1)) = 0) is True no_bg_point_error
+FROM geoms, results
+WHERE cartodb_id = id
+;
--- a/src/pg/test/sql/42_observatory_exploration_test.sql
+++ b/src/pg/test/sql/42_observatory_exploration_test.sql
@@ -289,6 +289,11 @@ FROM cdb_observatory.OBS_GetAvailableGeometries(
 ) WHERE valid_timespan = True)
 AS _obs_getavailablegeometries_bg_not_1996;

+SELECT 'subsection/tags.boundary' IN (SELECT (Jsonb_Each(geom_tags)).key
+FROM cdb_observatory.OBS_GetAvailableGeometries(
+  ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)
+)) AS _obs_getavailablegeometries_has_boundary_tag;
+
 --
 -- OBS_GetAvailableTimespans tests
 --
@@ -360,9 +365,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
        'us.census.tiger.county', 'us.census.tiger.zcta5'])
      WHERE table_id LIKE '%2015%';

-SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
-       ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
-             'us.census.tiger.zcta5', 'us.census.tiger.county']
+SELECT ARRAY_AGG(column_id ORDER BY score DESC)
+       = ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
+             'us.census.tiger.county', 'us.census.tiger.zcta5']
       AS _obs_geometryscores_5km_buffer
       FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
@@ -390,60 +395,55 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
      WHERE table_id LIKE '%2015%';

-SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
-       ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
-             'us.census.tiger.census_tract', 'us.census.tiger.block_group']
+SELECT ARRAY_AGG(column_id ORDER BY score DESC)
+       = ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
+             'us.census.tiger.zcta5', 'us.census.tiger.block_group']
      AS _obs_geometryscores_2500km_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
-  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
-        'us.census.tiger.zcta5', 'us.census.tiger.county'])
+  ARRAY['us.census.tiger.county', 'us.census.tiger.census_tract',
+        'us.census.tiger.zcta5', 'us.census.tiger.block_group'])
      WHERE table_id LIKE '%2015%';

-SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text
-      = '{ "us.census.tiger.block_group" : 9, "us.census.tiger.census_tract" : 3, "us.census.tiger.zcta5" : 0, "us.census.tiger.county" : 0 }'
-      AS _obs_geometryscores_numgeoms_500m_buffer
+SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500m_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500)::Geometry(Geometry, 4326),
  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
-      WHERE table_id LIKE '%2015%';
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC;

-SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
-      '{ "us.census.tiger.block_group" : 880, "us.census.tiger.census_tract" : 310, "us.census.tiger.zcta5" : 45, "us.census.tiger.county" : 1 }'
-      AS _obs_geometryscores_numgeoms_5km_buffer
+SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_5km_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 5000)::Geometry(Geometry, 4326),
  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
-      WHERE table_id LIKE '%2015%';
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC;

-SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
-      '{ "us.census.tiger.block_group" : 11531, "us.census.tiger.census_tract" : 3601, "us.census.tiger.zcta5" : 550, "us.census.tiger.county" : 14 }'
-      AS _obs_geometryscores_numgeoms_50km_buffer
+SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_50km_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
-      WHERE table_id LIKE '%2015%';
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC;

-SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
-      '{ "us.census.tiger.block_group" : 48917, "us.census.tiger.census_tract" : 15969, "us.census.tiger.zcta5" : 6534, "us.census.tiger.county" : 314 }'
-      AS _obs_geometryscores_numgeoms_500km_buffer
+SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_500km_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 500000)::Geometry(Geometry, 4326),
  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
-      WHERE table_id LIKE '%2015%';
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC;

-SELECT JSON_Object_Agg(column_id, numgeoms::int ORDER BY numgeoms DESC)::Text =
-      '{ "us.census.tiger.block_group" : 169191, "us.census.tiger.census_tract" : 56469, "us.census.tiger.zcta5" : 26525, "us.census.tiger.county" : 2753 }'
-      AS _obs_geometryscores_numgeoms_2500km_buffer
+SELECT column_id, numgeoms::int AS _obs_geometryscores_numgeoms_2500km_buffer
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 2500000)::Geometry(Geometry, 4326),
  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
        'us.census.tiger.zcta5', 'us.census.tiger.county'])
-      WHERE table_id LIKE '%2015%';
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC;

 SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
       ARRAY['us.census.tiger.county', 'us.census.tiger.zcta5',
@@ -475,9 +475,9 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
        'us.census.tiger.zcta5', 'us.census.tiger.county'], 2500)
      WHERE table_id LIKE '%2015%';

-SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
-       ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
-             'us.census.tiger.zcta5', 'us.census.tiger.county']
+SELECT ARRAY_AGG(column_id ORDER BY score DESC)
+       = ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
+               'us.census.tiger.county', 'us.census.tiger.zcta5']
      AS _obs_geometryscores_500km_buffer_25000_geoms
      FROM cdb_observatory._OBS_GetGeometryScores(
  ST_Buffer(ST_SetSRID(ST_MakePoint(-73.9, 40.7), 4326)::Geography, 50000)::Geometry(Geometry, 4326),
@@ -485,6 +485,44 @@ SELECT ARRAY_AGG(column_id ORDER BY score DESC) =
        'us.census.tiger.zcta5', 'us.census.tiger.county'], 25000)
      WHERE table_id LIKE '%2015%';

+-- Check that one small geom approximates tract data
+WITH geoms AS (SELECT cdb_observatory._testarea() the_geom),
+summary AS (SELECT ST_SetSRID(ST_Extent(the_geom), 4326) extent,
+                   COUNT(*)::INT cnt,
+                   SUM(ST_Area(the_geom))::Numeric sumarea
+            FROM geoms)
+SELECT column_id = 'us.census.tiger.census_tract' testarea_uses_tract
+FROM summary, LATERAL (
+  SELECT *
+  FROM cdb_observatory._OBS_GetGeometryScores(extent,
+  ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
+        'us.census.tiger.zcta5', 'us.census.tiger.county'],
+        cnt, sumarea)) foo
+ORDER BY score DESC LIMIT 1;
+
+-- Check that randomly distributed points always use smallest geometry if we
+-- order by numgeoms desc
+WITH geoms as (SELECT UNNEST(ARRAY[
+    cdb_observatory._testpoint(),
+    st_translate(cdb_observatory._testpoint(), -0.003, 0),
+    st_translate(cdb_observatory._testpoint(), -0.006, 0)
+]) the_geom),
+summary as (SELECT
+  ST_SetSRID(ST_Extent(the_geom), 4326) extent,
+  SUM(ST_Area(the_geom))::Numeric area,
+  COUNT(*)::INTEGER cnt
+  FROM geoms
+)
+SELECT column_id = 'us.census.tiger.block_group' points_use_bg
+      FROM summary, LATERAL (
+        SELECT * FROM cdb_observatory._OBS_GetGeometryScores(
+          extent,
+          ARRAY['us.census.tiger.block_group', 'us.census.tiger.census_tract',
+                'us.census.tiger.zcta5', 'us.census.tiger.county'],
+        cnt, area)) foo
+      WHERE table_id LIKE '%2015%'
+      ORDER BY numgeoms DESC LIMIT 1;
+
 --
 -- OBS_LegacyBuilderMetadata tests
 --
--- a/src/python/test/autotest.py
+++ b/src/python/test/autotest.py
@@ -73,6 +73,7 @@ SKIP_COLUMNS = set([
    , 'uk.ons.LC3204WA0004'
    , 'uk.ons.LC3204WA0007'
    , 'uk.ons.LC3204WA0010'
+    , 'br.geo.subdistritos_name'
 ])

 MEASURE_COLUMNS = query('''
Author	SHA1	Message	Date
John Krauss	183c046289	release artifact	2017-04-26 20:08:44 +00:00
John Krauss	8df89f4a91	remove br subdistritos from testing	2017-04-25 18:57:12 +00:00
John Krauss	28694163a2	prefer geographpic precision over most recent timespan, handles issues emerging from inclusion of 1-year acs	2017-04-25 18:53:12 +00:00
John Krauss	60c7f54315	update NEWS for 1.5.0, fix error in link in 1.4.0	2017-04-24 18:22:31 +00:00
John Krauss	3ebb0b8662	Merge branch 'release-v-1.5.0' into obs-getavailablegeometries-return-tags	2017-04-24 18:10:43 +00:00
John Krauss	a2e84696dc	fix tests to match fixture data	2017-04-24 18:01:38 +00:00
John Krauss	cd5cb38e8d	Merge branch 'release-v-1.5.0' into obs-getavailablegeometries-return-tags	2017-04-24 17:50:57 +00:00
John Krauss	26e1a2f461	Add tags to obs_getavailablegeometries Fixes #260 * Adds `geom_type`, `geom_extra`, and `geom_tags` to `OBS_GetAvailableGeometries`. This brings it up to spec with existing docs. * Adds `timespan_type`, `timespan_extra`, and `timespan_tags` to `OBS_GetAvailableTimespans` for consistency.	2017-04-03 21:51:32 +00:00
John Krauss	090a1add43	add suggested_name output to OBS_GetMeta. fixes #279	2017-04-03 19:44:00 +00:00
John Krauss	536af5e4a2	release artifact	2017-03-22 15:17:19 +00:00
John Krauss	ebf23d2a23	Merge branch 'develop' into release-v-1.4.0	2017-03-22 15:16:35 +00:00
John Krauss	f1afcf0d8e	update NEWS.md	2017-03-22 15:14:35 +00:00
John Krauss	3c0b40cf3f	more consistent arguments in docs	2017-03-22 15:12:50 +00:00
John Krauss	8a87dc7e9a	update NEWS.md	2017-03-21 21:24:50 +00:00
John Krauss	61552adba4	Allow for target_geoms and target_area override on column-by-column basis	2017-03-21 17:26:02 +00:00
csobier	36abbee64f	Merge pull request #274 from CartoDB/273-docs-edit clarification of docs for obs_getboundariesbygeometry function	2017-03-17 12:07:48 -04:00
csobier	5a76a7381e	clarification of docs for obs_getboundariesbygeometry function	2017-03-17 11:45:49 -04:00
John Krauss	217ca2d84d	release 1.3.5 artifact	2017-03-15 20:12:06 +00:00