Calculating Top N items and Aggregating (sum) the remainder into "All Other"
The top / bottom N problem is often seen when ranking items in your database. We've previously written about the ranking problem in Calculating Top N items per Group.
Another requirement that co-exists with this question is to rollup all the other rows into a row called "All other". The reason for doing this is to provide more context when presenting data.
Example data
Consider a table with two columns: a salesperson's name and the amount of business they've closed:
Name | Sales |
---|---|
Alice | 8405837 |
Bob | 3884307 |
Chris | 2718782 |
Dan | 2195914 |
Eve | 1553165 |
Fae | 1513367 |
George | 1409019 |
Hadi | 1355896 |
Ida | 1257676 |
Justine | 998537 |
... | ... |
1. Calculating simple top N items
Our first question is to list the top 10 salespeople by business closed, which can be accomplished with the following query. For more sophistication with window functions, see Calculating Top N items per Group:
select name,
sales
from deals
order by 3 desc
limit 10
name | sales |
---|---|
Alice | 8405837 |
Bob | 3884307 |
Chris | 2718782 |
Dan | 2195914 |
Eve | 1553165 |
Fae | 1513367 |
George | 1409019 |
Hadi | 1355896 |
Ida | 1257676 |
Justine | 998537 |
2. Aggregating into "All Other"
If we were to visualize the top salespeople as a pie chart, it'd look something like this:
The problem with the above chart is that we are missing data on how much business the other salespeople closed to put the visualization in context. We'll attempt to solve this question now.
What we are trying to accomplish is a final row appended to the results above that is the aggregation of the rest of the data, something like this:
name | sales |
---|---|
Alice | 8405837 |
Bob | 3884307 |
Chris | 2718782 |
... | ... |
Justine | 998537 |
All Other | 7833544 |
The following query accomplishes that:
with top10 as
(select name,
sales
from deals
order by 2 desc
limit 10)
select *
from top10
union all
select 'All other' as name,
sum(sales) as sales
from sales
where name not in
(select name
from top10)
How it works
We've organized the query into two queries and combined them with the union
keyword. The first query is as before that finds the top 10 salespeople. The second builds upon this by excluding names that were previously found with the not in
keyword. We've additionally "refactored" the core query as a CTE using the with
keyword.
No spam, ever! Unsubscribe any time. See past emails here.