Calculating Percentage (%) of Total Sum in SQL
How to compute the ratio of a particular row to the sum of values?
This question comes up frequently when you want to the relative contribution of a row against the backdrop of the total sum of all the rows. For example:
- how is the browser marketshare changing over time
- what's each sales person's contribution to your company's revenue
Consider a table with the number of page view (in billions) with each browser:
Browser | Pageviews |
---|---|
Chrome | 7.1685 |
Safari | 1.935 |
Firefox | 1.3455 |
UC Browser | 1.0965 |
IE | 1.341 |
Opera | 0.816 |
Android | 0.7245 |
Rest | 1.2 |
What we really want to see is the browser market share. We can use a Common Table Expression (CTE) to compute the total pageview which we then use to compute the ratio:
with total as
( select sum(pageviews) as total
from pageviews )
select browser,
pageviews / total.total as share
from pageviews,
total
Which gives a ratio of each browser to the total:
Browser | Share |
---|---|
Chrome | 0.895 |
Safari | 0.241875 |
Firefox | 0.1681875 |
UC Browser | 0.1370625 |
IE | 0.167625 |
Opera | 0.102 |
Android | 0.0905625 |
Rest | 0.15 |
And the visualization:
Percentage to Total per Group
The next question to ask is how this is changing over time?
What we are attempting to do here is to group our data into months, compute the total for that group and for each row within that group compute a ratio. An overall total wouldn't make sense. Conside the pageview table as before, but with an additional date field:
dt | Browser | Pageviews |
---|---|---|
2016-01-01 | Chrome | 7.1685 |
2016-01-01 | Safari | 1.935 |
2016-01-01 | ... | ... |
2016-01-02 | Chrome | 7.2485 |
2016-01-02 | Safari | 1.721 |
2016-01-02 | ... | ... |
... | ... | ... |
2016-12-31 | Chrome | 7.864 |
2016-12-31 | Safari | 2.011 |
2016-12-31 | ... | ... |
We once again to resort to window functions with a partition over the month portion of the datetime.
select date_trunc('month', dt),
browser,
pageviews / sum(pageviews) over(partition by date_trunc('month', dt))
from pageviews
Let's unpack the query a bit. Our window function creates partitions (groups) for each month, and for each partition we sum up the pageviews. The ratio between the current row pageviews and the total for that group is exactly what we want.
Redshift has ratio_to_report
Fortunately on Redshift, it supports a window function called ratio_to_report
which computes the ratio of the value of a column in a row with that of the sum of the said expression over the group.
select date_trunc('month', dt),
browser,
ratio_to_report(pageviews) over(partition by date_trunc('month', dt))
from pageviews
No spam, ever! Unsubscribe any time. See past emails here.