Compare commits

...

45 Commits
0.7.2 ... 0.7.4

Author SHA1 Message Date
Rafa de la Torre
2b46a2d56f Merge pull request #91 from CartoDB/upgrade-version-0-7-4
Update Makefile and NEWS.md for new version
2015-06-29 12:37:30 +02:00
Rafa de la Torre
371d84ea0c Update Makefile and NEWS.md for new version 2015-06-29 12:09:35 +02:00
Andy Eschbacher
e5897f3dad Merge pull request #87 from CartoDB/categ-distrib
Function deciding criteria for using a category column in a map
2015-06-25 11:50:56 -04:00
Andy Eschbacher
b9fe204007 Merge pull request #81 from CartoDB/equalint
adding equal interval function for consistency
2015-06-25 11:08:24 -04:00
Rafa de la Torre
9b2cff15c5 Merge pull request #88 from CartoDB/86-CDB_QueryTables-fix-long-names
Add a new function CDB_QueryTablesText #86
2015-06-25 16:20:08 +02:00
Andy Eschbacher
13946b4d47 update test output 2015-06-25 08:17:41 -04:00
Andy Eschbacher
97140b17c9 added more flexible output values 2015-06-24 11:03:16 -04:00
Rafa de la Torre
c3eea08f66 Remove ECHO from expectation #86
Remove the `\set ECHO none` from expectation that is automatically
removed by the test harness but still appears in the output when a test
fails.
2015-06-24 16:29:40 +02:00
Rafa de la Torre
22fc962d09 Change expectation #86
Just add CONTEXT lines since they are now added in case of
WARNING/ERROR as a result of having CDB_QueryTables calling
CDB_QueryTablesText.
2015-06-24 16:01:45 +02:00
Rafa de la Torre
ddb6b2c5b5 Return text instead of regclass #86
This way the schema is always returned and backwards compatibility is
kept, should it be needed.
2015-06-24 14:14:00 +02:00
Rafa de la Torre
9a94b3879a Add a new function CDB_QueryTablesRegclass #86
The return values of it can be safely used when len(schema.table_name)
exceeds the 63 char limit of the postgres type `name`.
2015-06-24 11:53:09 +02:00
Andy Eschbacher
d124776c4e simplified assignment 2015-06-23 18:49:59 -04:00
Andy Eschbacher
5941b473ca removed notice 2015-06-23 18:39:15 -04:00
Andy Eschbacher
3ad3038c5e fixed symlink path, other minor items 2015-06-23 18:32:16 -04:00
Andy Eschbacher
c7bb57b405 add symlink 2015-06-23 18:08:32 -04:00
Andy Eschbacher
f8542af57a add tests 2015-06-23 18:07:48 -04:00
Andy Eschbacher
cda6953ea6 initial commit 2015-06-23 16:27:27 -04:00
Andy Eschbacher
189309e1a5 Merge pull request #84 from CartoDB/add-dist-classify
Add distribution classifier
2015-06-12 08:53:42 -04:00
Andy Eschbacher
1d223b77cc changed subfunction name, replaced function with case statement 2015-06-10 10:50:01 -04:00
Andy Eschbacher
6ab1b1d3d0 removed unneeded variables 2015-06-08 18:36:07 -04:00
Andy Eschbacher
c7f4209270 added alias and line 2015-06-08 15:11:58 -04:00
Andy Eschbacher
8e2d86414f updating function 2015-06-08 15:02:39 -04:00
Andy Eschbacher
9cb1fe30d8 adding tests 2015-06-08 15:01:50 -04:00
Andy Eschbacher
424564e324 initial commit 2015-06-08 13:37:27 -04:00
Andy Eschbacher
42a617e79c ugh bad filename 2015-05-19 16:03:54 -04:00
Andy Eschbacher
bf4a31842b new result 2015-05-19 15:50:34 -04:00
Andy Eschbacher
a3c8d7bce4 initial commit 2015-05-19 15:28:17 -04:00
Andy Eschbacher
737dc1c1f1 updated formating of test 2015-05-18 12:12:55 -04:00
Andy Eschbacher
d0c85855f5 fixed test expectation value 2015-05-18 12:00:42 -04:00
Andy Eschbacher
ee1df92561 fixed precision in tests 2015-05-14 15:45:57 -04:00
Andy Eschbacher
16d0dc739a added tests improved func 2015-05-14 15:32:58 -04:00
Andy Eschbacher
dcd35fc3d7 Merge branch 'master' into equalint 2015-05-07 17:07:34 -04:00
Andy Eschbacher
2ad3ff547d initial 2015-05-07 15:49:35 -04:00
Andy Eschbacher
b1e1723e75 Merge pull request #80 from CartoDB/sariogonfer-master
updates quantile bins algorithm
2015-05-07 10:55:19 -04:00
Andy Eschbacher
d9e254dbd5 missed updating value 2015-04-28 10:21:34 -04:00
Andy Eschbacher
7d0efa95fb updated test 2015-04-28 09:59:57 -04:00
Andy Eschbacher
1552c03dd4 removed group by; made binning more reliable 2015-04-27 17:59:40 -04:00
sariogonfer
de418ab36d Merge pull request #1 from sariogonfer/sariogonfer-patch-2
Update CDB_QuantileBins.sql
2015-04-27 12:05:23 +02:00
sariogonfer
cbd3c447b6 Update CDB_QuantileBins.sql 2015-04-14 20:45:37 +02:00
Raul Ochoa
fc95566ddd Remove test for unexistent table as there was already one 2015-03-31 16:01:49 +02:00
Raul Ochoa
7f58e1f690 Adds tests for cdb_tablemetadatatouch
- fixes tableoid by using the proper table oid
 - tests quoted and unqouted call with OID
 - tests non existent table to fail
2015-03-31 14:48:45 +02:00
Rafa de la Torre
ca643b2e03 Merge pull request #75 from CartoDB/73-fix-upgrade-of-cdb-stringtodate
73 fix upgrade of cdb stringtodate
2015-03-03 17:27:45 +01:00
Rafa de la Torre
a7a52a23ea fix indentation of Makefile #73 2015-03-03 16:10:56 +00:00
Rafa de la Torre
1c9e5f241f Fix upgrade of CDB_StringToDate function #73 2015-03-03 16:09:57 +00:00
Raul Ochoa
38d32371c8 Adds test to validate CDB_TableMetadataTouch usage with OID 2015-03-03 12:09:25 +01:00
20 changed files with 322 additions and 18 deletions

View File

@@ -1,7 +1,7 @@
# cartodb/Makefile
EXTENSION = cartodb
EXTVERSION = 0.7.2
EXTVERSION = 0.7.4
SED = sed
@@ -36,6 +36,8 @@ UPGRADABLE = \
0.6.0 \
0.7.0 \
0.7.1 \
0.7.2 \
0.7.3 \
$(EXTVERSION)dev \
$(EXTVERSION)next \
$(END)

14
NEWS.md
View File

@@ -1,3 +1,17 @@
0.7.4 (2015-06-29)
------------------
* Adds new function CDB_QueryTablesText that can deal with "schema.table_name"
longer than 63 chars.
* Adds a set of statistical functions:
- CDB_DistType
- CDB_DistinctMeasure
- CDB_EqualIntervalBins
0.7.3 (2015-03-03)
------------------
* Fix upgrade of CDB_StringToDate function
* Add a test for to validate CDB_TableMetadataTouch usage with OID
0.7.2 (2015-03-03)
------------------
* Fix conversion of strings to datetime

View File

@@ -0,0 +1,122 @@
--
-- CDB_DistType classifies the histograms of a column into
-- one of the basic types listed by Galtung: http://druedin.com/2012/12/08/galtungs-ajus-system/
--
-- Future improvements:
-- variable number of bins (7 is baked in right now)
-- catch the number of items to ensure that the sample is large enough
--
-- Refs:
-- 1. width_bucket/histograms: http://tapoueh.org/blog/2014/02/21-PostgreSQL-histogram
-- 2. R implementation: https://github.com/cran/agrmt
CREATE OR REPLACE FUNCTION CDB_DistType ( in_array NUMERIC[] ) RETURNS text as $$
DECLARE
element_count INT4;
minv numeric;
maxv numeric;
bins numeric[];
freqs numeric[];
ajus INT[];
freq INT4;
signature text;
i INT := 1;
BEGIN
SELECT min(e), max(e), count(e) INTO minv, maxv, element_count FROM ( SELECT unnest(in_array) e ) x;
IF abs(maxv - minv) < 1e-7 THEN -- if max and min are nearly equal, call if 'F' (make relative to maxv?)
signature = 'F';
ELSE
-- Calculate bins and count in bins
EXECUTE 'WITH stats as (
SELECT min(e) as minv,
max(e) as maxv,
count(e) as total
FROM (SELECT unnest($1) e) x
WHERE e is not null
),
hist as (
SELECT width_bucket(e, s.minv, s.maxv, 7) bucket,
count(*) freq
FROM (SELECT unnest($1) e) x, stats s
WHERE e is not null
GROUP BY 1
ORDER BY 1
)
SELECT array_agg(round(100.0 * hist.freq::numeric / stats.total::numeric,1)) freqs,
array_agg(hist.bucket) buckets
FROM hist, stats'
INTO freqs, bins
USING in_array;
LOOP
IF i < 7 THEN
ajus[i] = CASE WHEN freqs[i] > freqs[i+1] THEN -1
WHEN abs(freqs[i] - freqs[i+1]) <= 0.05 THEN 0
ELSE 1 END;
ELSE
EXIT;
END IF;
i := i + 1;
END LOOP;
signature = _CDB_DistTypeClassify(ajus);
END IF;
RETURN signature;
END;
$$ language plpgsql IMMUTABLE;
-- Classify data into AJUSFL
CREATE OR REPLACE FUNCTION _CDB_DistTypeClassify ( in_array INT[] ) RETURNS text as $$
DECLARE
element_count INT4;
maxv numeric;
minv numeric;
uniques INT[];
type text;
BEGIN
SELECT max(e), min(e) INTO maxv, minv FROM ( SELECT unnest(in_array) e ) x;
IF (maxv = 0 AND minv = 0) THEN
type = 'F';
ELSIF maxv < 1 THEN
type = 'L';
ELSIF minv > -1 THEN
type = 'J';
ELSE
-- Get distinct elements ordered by original position
EXECUTE 'WITH b AS (
SELECT a
FROM (SELECT unnest($1) a) x
),
c AS (
SELECT a, row_number() OVER () r
FROM b
),
d AS (
SELECT DISTINCT a
FROM c
),
e AS (
SELECT a FROM d ORDER BY (
SELECT r FROM c WHERE d.a = c.a ORDER BY r ASC LIMIT 1
) ASC)
SELECT array_agg(a) FROM e'
INTO uniques
USING in_array;
-- Decide if it's an A, U, or other
IF (uniques = ARRAY[1,-1] OR uniques = ARRAY[1,0,-1] OR uniques = ARRAY[1,-1,0] OR uniques = ARRAY[0,1,-1]) THEN
type = 'A';
ELSIF (uniques = ARRAY[-1,1] OR uniques = ARRAY[-1,0,1] OR uniques = ARRAY[-1,1,0] OR uniques = ARRAY[0,-1,1]) THEN
type = 'U';
ELSE
type = 'S';
END IF;
END IF;
RETURN type;
END;
$$ language plpgsql IMMUTABLE;

View File

@@ -0,0 +1,46 @@
--
-- CDB_DistinctMeasure
-- calculates the fraction of rows in the 10 most common distinct categories
-- returns true if the number of rows in these 10 categories is >= 0.9 * total number of rows
--
--
CREATE OR REPLACE FUNCTION CDB_DistinctMeasure ( in_array text[], threshold numeric DEFAULT null ) RETURNS numeric as $$
DECLARE
element_count INT4;
maxval numeric;
passes numeric;
BEGIN
SELECT count(e) INTO element_count FROM ( SELECT unnest(in_array) e ) x;
-- count number of occurrences per bin
-- calculate the normalized cumulative sum
-- return the max value: which corresponds nth entry
-- for n <= 10 depending on # of distinct values
EXECUTE 'WITH a As (
SELECT
count(*) cnt
FROM
(SELECT * FROM unnest($2) e ) x
WHERE e is not null
GROUP BY e
ORDER BY cnt DESC
),
b As (
SELECT
sum(cnt) OVER (ORDER BY cnt DESC) / $1 As cumsum
FROM a
LIMIT 10
)
SELECT max(cumsum) maxval FROM b'
INTO maxval
USING element_count, in_array;
IF threshold is null THEN
passes = maxval;
ELSE
passes = CASE WHEN (maxval >= threshold) THEN 1 ELSE 0 END;
END IF;
RETURN passes;
END;
$$ language plpgsql IMMUTABLE;

View File

@@ -0,0 +1,37 @@
--
-- Calculate the equal interval bins for a given column
--
-- @param in_array A numeric array of numbers to determine the best
-- to determine the bin boundary
--
-- @param breaks The number of bins you want to find.
--
--
-- Returns: upper edges of bins
--
--
CREATE OR REPLACE FUNCTION CDB_EqualIntervalBins ( in_array NUMERIC[], breaks INT ) RETURNS NUMERIC[] as $$
DECLARE
diff numeric;
min_val numeric;
max_val numeric;
tmp_val numeric;
i INT := 1;
reply numeric[];
BEGIN
SELECT min(e), max(e) INTO min_val, max_val FROM ( SELECT unnest(in_array) e ) x WHERE e IS NOT NULL;
diff = (max_val - min_val) / breaks::numeric;
LOOP
IF i < breaks THEN
tmp_val = min_val + i::numeric * diff;
reply = array_append(reply, tmp_val);
i := i+1;
ELSE
reply = array_append(reply, max_val);
EXIT;
END IF;
END LOOP;
RETURN reply;
END;
$$ language plpgsql IMMUTABLE;

View File

@@ -15,18 +15,29 @@ DECLARE
i INT := 1;
reply numeric[];
BEGIN
-- get our unique values
SELECT array_agg(e) INTO in_array FROM (SELECT unnest(in_array) e GROUP BY e ORDER BY e ASC) x;
-- get the total size of our row
element_count := array_upper(in_array, 1) - array_lower(in_array, 1);
-- sort our values
SELECT array_agg(e) INTO in_array FROM (SELECT unnest(in_array) e ORDER BY e ASC) x;
-- get the total size of our data
element_count := array_length(in_array, 1);
break_size := element_count::numeric / breaks;
-- slice our bread
LOOP
IF i > breaks THEN EXIT; END IF;
SELECT e INTO tmp_val FROM ( SELECT unnest(in_array) e LIMIT 1 OFFSET round(break_size * i)) x;
IF i < breaks THEN
IF break_size * i % 1 > 0 THEN
SELECT e INTO tmp_val FROM ( SELECT unnest(in_array) e LIMIT 1 OFFSET ceil(break_size * i) - 1) x;
ELSE
SELECT avg(e) INTO tmp_val FROM ( SELECT unnest(in_array) e LIMIT 2 OFFSET ceil(break_size * i) - 1 ) x;
END IF;
ELSIF i = breaks THEN
-- select the last value
SELECT max(e) INTO tmp_val FROM ( SELECT unnest(in_array) e ) x;
ELSE
EXIT;
END IF;
reply = array_append(reply, tmp_val);
i := i+1;
END LOOP;
RETURN reply;
i := i+1;
END LOOP;
RETURN reply;
END;
$$ language plpgsql IMMUTABLE;
$$ language plpgsql IMMUTABLE;

View File

@@ -2,12 +2,12 @@
--
-- Requires PostgreSQL 9.x+
--
CREATE OR REPLACE FUNCTION CDB_QueryTables(query text)
RETURNS name[]
CREATE OR REPLACE FUNCTION CDB_QueryTablesText(query text)
RETURNS text[]
AS $$
DECLARE
exp XML;
tables NAME[];
tables text[];
rec RECORD;
rec2 RECORD;
BEGIN
@@ -41,11 +41,11 @@ BEGIN
xpath('//x:Relation-Name/text()', exp, ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']]) as x,
xpath('//x:Relation-Name/../x:Schema/text()', exp, ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']]) as s
)
SELECT unnest(x)::name as p, unnest(s)::name as sc from inp
SELECT unnest(x) as p, unnest(s) as sc from inp
LOOP
-- RAISE DEBUG 'tab: %', rec2.p;
-- RAISE DEBUG 'sc: %', rec2.sc;
tables := array_append(tables, (rec2.sc || '.' || rec2.p)::name);
tables := array_append(tables, (rec2.sc || '.' || rec2.p));
END LOOP;
-- RAISE DEBUG 'Tables: %', tables;
@@ -65,3 +65,14 @@ BEGIN
return tables;
END
$$ LANGUAGE 'plpgsql' VOLATILE STRICT;
-- Keep CDB_QueryTables with same signature for backwards compatibility.
-- It should probably be removed in the future.
CREATE OR REPLACE FUNCTION CDB_QueryTables(query text)
RETURNS name[]
AS $$
BEGIN
RETURN CDB_QueryTablesText(query)::name[];
END
$$ LANGUAGE 'plpgsql' VOLATILE STRICT;

View File

@@ -1,5 +1,6 @@
-- Convert string to date
--
DROP FUNCTION IF EXISTS CDB_StringToDate(character varying);
CREATE OR REPLACE FUNCTION CDB_StringToDate(input character varying)
RETURNS TIMESTAMP AS $$
DECLARE output TIMESTAMP;

View File

@@ -0,0 +1 @@
../scripts-available/CDB_EqualIntervalBins.sql

View File

@@ -0,0 +1 @@
../scripts-available/CDB_DistType.sql

View File

@@ -0,0 +1 @@
../scripts-available/CDB_DistinctMeasure.sql

View File

@@ -0,0 +1,4 @@
WITH data AS (
SELECT pow(x,3)::numeric x FROM generate_series(-100,100) x
)
SELECT CDB_DistType(array_agg(x)) FROM data

View File

@@ -0,0 +1 @@
A

View File

@@ -0,0 +1,20 @@
-- a - j add up to 89%, k-m add up to 11%
WITH a As (
SELECT (
repeat('a',12) ||
repeat('b',11) ||
repeat('c',11) ||
repeat('d',10) ||
repeat('e',10) ||
repeat('f',9) ||
repeat('g',8) ||
repeat('h',7) ||
repeat('i',6) ||
repeat('j',5) ||
repeat('k',4) ||
repeat('l',4) ||
repeat('m',3)
)::text AS x
)
SELECT CDB_DistinctMeasure(string_to_array(x,null),0.90) from a

View File

@@ -0,0 +1 @@
0

View File

@@ -0,0 +1,5 @@
WITH data AS (
SELECT array_agg(x::numeric) s FROM generate_series(1,300) x
WHERE x % 5 != 0 AND x % 7 != 0
)
SELECT round(unnest(CDB_EqualIntervalBins(s, 7)),7) FROM data

View File

@@ -0,0 +1,7 @@
43.5714286
86.1428571
128.7142857
171.2857143
213.8571429
256.4285714
299.0000000

View File

@@ -1,7 +1,7 @@
16
13
29
43
57
71
83
86
99

View File

@@ -4,10 +4,14 @@ CREATE table "my'tab;le" as select 1|{}
SELECT a.oid, b.oid FROM pg_class a, pg_class b|{pg_catalog.pg_class}
SELECT 1 as col1; select 2 as col2|{}
WARNING: CDB_QueryTables cannot explain query: select 1 from nonexistant (42P01: relation "nonexistant" does not exist)
CONTEXT: PL/pgSQL function cdb_querytables(text) line 3 at RETURN
ERROR: relation "nonexistant" does not exist
CONTEXT: PL/pgSQL function cdb_querytables(text) line 3 at RETURN
begin; select * from pg_class; commit;|{pg_catalog.pg_class}
WARNING: CDB_QueryTables cannot explain query: select * from test (42P01: relation "test" does not exist)
CONTEXT: PL/pgSQL function cdb_querytables(text) line 3 at RETURN
ERROR: relation "test" does not exist
CONTEXT: PL/pgSQL function cdb_querytables(text) line 3 at RETURN
WITH a AS (select * from pg_class) select * from a|{pg_catalog.pg_class}
CREATE SCHEMA
CREATE TABLE

View File

@@ -302,6 +302,21 @@ function test_cdb_tablemetadatatouch() {
sql "SELECT CDB_TableMetadataTouch('\"public\".touch_example');"
sql "SELECT CDB_TableMetadataTouch('\"public\".\"touch_example\"');"
# Works with OID
sql postgres "SELECT tabname from CDB_TableMetadata;" should 'touch_example'
sql postgres "SELECT count(*) from CDB_TableMetadata;" should 1
TABLE_OID=`${CMD} -U postgres ${DATABASE} -c "SELECT attrelid FROM pg_attribute WHERE attrelid = 'touch_example'::regclass limit 1;" -A -t`
# quoted OID works
sql "SELECT CDB_TableMetadataTouch('${TABLE_OID}');"
sql postgres "SELECT tabname from CDB_TableMetadata;" should 'touch_example'
sql postgres "SELECT count(*) from CDB_TableMetadata;" should 1
# non quoted OID works
sql "SELECT CDB_TableMetadataTouch(${TABLE_OID});"
sql postgres "SELECT tabname from CDB_TableMetadata;" should 'touch_example'
sql postgres "SELECT count(*) from CDB_TableMetadata;" should 1
#### test tear down
sql 'DROP TABLE touch_example;'
}