Concatenating Rows of String Values for Aggregation
What would an aggregation of rows of string columns look like? For numeric columns, we can calculate averages and sums over many rows to aggregate them. Rows with string columns require a different treatment, which we'll explore in this recipe.
This recipe is useful in many-to-many relationships, when you want to aggregate one side of a relationship for reporting purposes:
- security management – a table of user and their corresponding roles in an organization
- account management – a table of sales people / account managers and the customers they manage
- inventory management – a table of stores and a list of products they hold or have sold
- tag management – a table of tags applied to resources and you want to show a summary of tags against a resource
If you have a table of store locations and the products they hold along with their quantities:
location | product | quantity |
---|---|---|
New York | Sports Collectibles | 200 |
New York | Electronics | 1400 |
Seattle | Shoes | 1200 |
Seattle | Electronics | 1400 |
San Francisco | Textbook Rentals | 600 |
San Francisco | Shoes | 1000 |
... | ... | ... |
Or, if your interested in the account management example, the table will look like this:
Account Manager | Customer | Contract Value |
---|---|---|
Alice | Microsoft | $1,000 |
Alice | Apple | $2,000 |
Bob | $4,000 | |
Bob | Dell | $1,200 |
Eve | Netflix | $600 |
... | ... | ... |
Usually, best practices for data management will instruct you to normalize your data into three tables – a a store_location
table, a product
table and a product_quantity
table. You can accomplish the first level of aggregation seen in the tables above with a couple of joins between them.
What we want our result to look like
location | products_held |
---|---|
New York | Electronics, Sports Collectibles |
Seattle | Electronics, Shoes |
San Francisco | Shoes, Textbook Rentals |
We want to create a new aggregate column products_held
on the product
column from the original table that contains string values. Furthermore, we want to sort the aggregation by the quantity
which would make the data more readily usable.
PostgreSQL's string_agg
to the rescue
select location,
string_agg(product, ', ' order by quantity desc)
from products
group by location;
PostgreSQL's aggregation functions are well-fleshed out. You can use other functions in the same class like array_agg
and json_agg
to accomplish similar results with array and json data types.
No spam, ever! Unsubscribe any time. See past emails here.