Gap Analysis to find missing values in a sequence
There are times when you want to find all gaps in a sequence of numbers, dates or data with a logically consecutive nature. A related scenario is to find ranges of unbroken sequences. There are a variety of use-cases of gap analysis:
- If you require your employees to checkin to a timesheet everyday, a gap analysis can show you days where the employee was absent.
- If you have a fleet of vehicles doing deliveries, a gap analysis can show periods of time when all vehicles are not being used. Useful for maintenance or downtime.
- If you have a service level agreement for 24/7 coverage, a gap analysis can show you contractual breaches.
Example Data
For illustration purposes, we are going to consider a simple table of integers from 1 to 10, with the numbers 3, 4 and 7 missing:
create table gap (counter integer);
insert into gap (counter) values (1);
insert into gap (counter) values (2);
insert into gap (counter) values (5);
insert into gap (counter) values (6);
insert into gap (counter) values (8);
insert into gap (counter) values (9);
insert into gap (counter) values (10);
and,
select counter from gap;
counter |
---|
1 |
2 |
5 |
6 |
8 |
9 |
10 |
The "counter" in the example could be the list of checks from your bank's transaction log, and you want to find missing checks. Other data types such as dates or timestamps can be generalized into this format by mapping the timestamp into an epoch time.
1. Finding Gaps with an exclusion join
You can find missing values by joining with a table of all possible values. If you don't have such a table handy, you can generate one on the fly using PostgreSQL's generate_series
(see alternatives for MySQL and Redshift). The generate_series
function returns a continuous series as multiple rows.
select generate_series(1, 10, 1);
generate_series |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
We then left join
our gap
table with the generated series:
select series,
gap.counter
from generate_series(1, 10, 1) series
left join gap on series = gap.counter;
series | counter |
---|---|
1 | 1 |
2 | 2 |
3 | null |
4 | null |
5 | 5 |
6 | 6 |
7 | null |
8 | 8 |
9 | 9 |
10 | 10 |
By filtering out the rows that have a null
for the joined value, we get what we want:
select series,
gap.counter
from generate_series(1, 10, 1) series
left join gap on series = gap.counter
where counter is null;
series | counter |
---|---|
3 | null |
4 | null |
7 | null |
You'll obviously want to use min(gap.counter)
and max(gap.counter)
as parameters to the generate_series
function to generalize this solution.
2. Finding ranges of missing gaps
The above query gives you individual values that are missing. More useful is to present the missing values as a range. The desired result is:
start | end |
---|---|
3 | 4 |
7 | 7 |
To find the start of a gap, we left join the table with itself on a join key offset by 1. This is similar to the query above:
select gap.counter + 1 as start
from gap
left join gap r on gap.counter = r.counter - 1
where r.counter is null;
start |
---|
3 |
7 |
11 |
(there's an extra row with a value of 11 at the end that is detected as a gap, which we'll fix later)
To find the end of a gap, we'll use a cross join to enumerate all the values of the gap
table to find the start of the next valid range.
select min(fr.counter) - 1 as stop
from gap
left join gap fr on gap.counter < fr.counter
where fr.counter is not null
group by gap.counter;
Putting these two queries together:
select gap.counter + 1 as start,
min(fr.counter) - 1 as stop
from gap
left join gap r on gap.counter = r.counter - 1
left join gap fr on gap.counter < fr.counter
where r.counter is null
and fr.counter is not null
group by gap.counter,
r.counter;
start | end |
---|---|
3 | 4 |
7 | 7 |
3. Finding ranges of continous values
This solution is for the opposite problem, finding ranges of continous values outside of gaps.
The desired result is:
start | end |
---|---|
1 | 2 |
5 | 6 |
8 | 10 |
To find the start of a valid range, we look for numbers that have no previous value:
select *
from gap
left join gap s on s.counter = gap.counter - 1;
counter | counter_1 |
---|---|
1 | null |
2 | 1 |
5 | null |
6 | 5 |
8 | null |
9 | 8 |
10 | 9 |
Filtering out rows that do not have a match:
select gap.counter as start,
s.counter
from gap
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
start | counter |
---|---|
1 | null |
5 | null |
8 | null |
To find the end of a valid range,
select gap.counter as stop, e.counter
from gap
left join gap e on e.counter = gap.counter + 1
where e.counter is null;
stop | counter |
---|---|
2 | null |
6 | null |
10 | null |
Combining the two queries, gives us the desired result:
select gap.counter as start,
(select a.counter as counter
from gap a
left join gap b on b.counter = a.counter + 1
where b.counter is null
and a.counter >= gap.counter
limit 1) as stop
from gap
left join gap s on s.counter = gap.counter - 1
where s.counter is null;
start | end |
---|---|
1 | 2 |
5 | 6 |
8 | 10 |
4. Gap analysis using window functions
The above queries can be rewritten using the SQL window functions LEAD
and LAG
. We'll revisit this in the future.
No spam, ever! Unsubscribe any time. See past emails here.