Calculating Top N items per Group in SQL
How to find the top / bottom "N" rows in each group?
This question is more commonly seen that you think:
- Out of a list of twitter users, find their N recent most tweets
- Out of all our salespeople, find the top N performing ones by closed business in each region
- Out of all the cities, find the top N cities in each country by population
Conside a table cities
with three columns country, city and population:
country | city | population |
---|---|---|
United States | New York | 8175133 |
United States | Los Angeles | 3792621 |
United States | Chicago | 2695598 |
France | Paris | 2181000 |
France | Marseille | 808000 |
France | Lyon | 422000 |
United Kingdom | London | 7825300 |
United Kingdom | Birmingham | 1016800 |
United Kingdom | Leeds | 770800 |
Top N without Grouping
We have 9 cities in our table, let's select the top 3 without grouping.
select country,
city,
population
from cities
order by population desc
limit 3;
country | city | population |
---|---|---|
United States | New York | 8175133 |
United Kingdom | London | 7825300 |
United States | Los Angeles | 3792621 |
Top N with Grouping with row_number()
Fortunately, we have access to the window function row_number()
that numbers the current row within its partition (group), counting at 1.
When we don't specify any partition:
select country,
city,
population,
row_number() over (order by population desc) as country_rank
from cities;
country | city | population | country_rank |
---|---|---|---|
United States | New York | 8175133 | 1 |
United Kingdom | London | 7825300 | 2 |
United States | Los Angeles | 3792621 | 3 |
United States | Chicago | 2695598 | 4 |
France | Paris | 2181000 | 5 |
United Kingdom | Birmingham | 1016800 | 6 |
France | Marseille | 808000 | 7 |
United Kingdom | Leeds | 770800 | 8 |
France | Lyon | 422000 | 9 |
When we create a partition for each country:
select country,
city,
population,
row_number() over (partition by country order by population desc) as country_rank
from cities;
country | city | population | country_rank |
---|---|---|---|
France | Paris | 2181000 | 1 |
France | Marseille | 808000 | 2 |
France | Lyon | 422000 | 3 |
United Kingdom | London | 7825300 | 1 |
United Kingdom | Birmingham | 1016800 | 2 |
United Kingdom | Leeds | 770800 | 3 |
United States | New York | 8175133 | 1 |
United States | Los Angeles | 3792621 | 2 |
United States | Chicago | 2695598 | 3 |
Almost there!
Now to select the top 2 cities from each country, we wrap the above query in another query and filter the row on the country_rank
column.
select * from (
select country,
city,
population,
row_number() over (partition by country order by population desc) as country_rank
from cities) ranks
where country_rank <= 2;
country | city | population | country_rank |
---|---|---|---|
France | Paris | 2181000 | 1 |
France | Marseille | 808000 | 2 |
United Kingdom | London | 7825300 | 1 |
United Kingdom | Birmingham | 1016800 | 2 |
United States | New York | 8175133 | 1 |
United States | Los Angeles | 3792621 | 2 |
👋 No fuss, just SQL
We are open sourcing everything from the experience working with our agency clients. They spend thousands of dollars to get this level of detailed analysis – which you can now get for free.
We send one update every week. Join 400+ data analysts who are leveling up with our recipes. 👊
No spam, ever! Unsubscribe any time. See past emails here.