Redshift: Generate a sequential range of numbers for time series analysis
One of our favorite features in PostgreSQL is the generate_series
function. Given start
, stop
and step
interval, PostgreSQL can generate a series of values, from start
to stop
with a step size of step
. It can even work with dates or timestamps:
select generate_series('2017-01-01'::date, '2017-05-01'::date, '1 week'::interval);
Unfortunately, though Redshift supports the simpler variant to generate integer sequences, it does not support the date variant. We'll have to build an equivalent.
The common use-case for this function is to generate a sequential range of dates, and use a left join to figure out dates where you have no data. If you didn't do this, your timeseries will have gaps and your chart will be misleading.
Method 1: Create a table with sequential numbers
The simplest option is to create a table, for example, numbers
and select from that. You can convert each number into the relevant date using Redshift's date manipulation functions:
select (getdate()::date - n)::date from numbers;
The numbers
table can be created by using the row_number
window function and the stl_connection_log
system table that logs authentication attempts and connections/disconnections. For Redshift clusters with even the basic level of use, looping over the stl_connection_log
table with a cross join should generate sufficient data:
insert into numbers
with x as (
select 1
from stl_connection_log a, stl_connection_log b, stl_connection_log c
-- limit 1000000
)
select row_number() over (order by 1) from x;
Method 2: Create a CTE counter
If you don't want to create a table before hand, you can create one on the fly – using Redshift's Common Table Expressions. A CTE works like a temporary table that only exists during the execution of the query.
with digit as (
select 0 as d union all
select 1 union all select 2 union all select 3 union all
select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9
),
seq as (
select a.d + (10 * b.d) + (100 * c.d) + (1000 * d.d) as num
from digit a
cross join
digit b
cross join
digit c
cross join
digit d
order by 1
)
select (getdate()::date - seq.num)::date as "Date"
from seq;
The way this works is by first creating a temporary table digit
with 10 rows containing values from 0 through 9. Next, we create all permutations of the digit
table with the help of a couple of cross joins. By multiplying the digits by 1, 10, 100 and 1000, all values from 0 to 9999 are calculated.
We can finally use the sequential numbers to generate a continuous sequence of date using Redshift's date manipulation functions.
No spam, ever! Unsubscribe any time. See past emails here.