GROUPING SETS

We've done aggregations on multiple levels. For example, we've looked at average salaries per company

SELECT
  company,
  avg(salary) AS average_salary
FROM employees
GROUP BY company;
company
VARCHAR
average_salary
NUMERIC
Footbook
154750.000000000000
Wamazon
145923.076923076923
Poogle
186461.538461538462

And we can drill down deeper and look at departments within each company.

SELECT
  company,
  department,
  avg(salary) AS average_salary
FROM employees
GROUP BY company, department
ORDER BY company, department;
company
VARCHAR
department
VARCHAR
average_salary
NUMERIC
Footbook
Integration
157200.000000000000
Footbook
Research
145000.000000000000
Footbook
Solutions
163666.666666666667
Poogle
Functionality
187333.333333333333
Poogle
Infrastructure
167000.000000000000
Poogle
Marketing
215000.000000000000
Wamazon
Marketing
145500.000000000000
Wamazon
Quality
146750.000000000000
Wamazon
Solutions
145666.666666666667

What if we want to see both of these in a single query? We would need some way of specifying 2 different groupings, GROUP BY company and GROUP BY company, department. That's where GROUPING SETS comes in. It gives us a way to do just that.

SELECT
  company,
  department,
  avg(salary) AS average_salary
FROM employees
GROUP BY GROUPING SETS (
  (company),
  (company, department)
)
ORDER BY company, department;
company
VARCHAR
department
VARCHAR
average_salary
NUMERIC
Footbook
Integration
157200.000000000000
Footbook
Research
145000.000000000000
Footbook
Solutions
163666.666666666667
Footbook
NULL
154750.000000000000
Poogle
Functionality
187333.333333333333
Poogle
Infrastructure
167000.000000000000
Poogle
Marketing
215000.000000000000
Poogle
NULL
186461.538461538462
Wamazon
Marketing
145500.000000000000
Wamazon
Quality
146750.000000000000
Wamazon
Solutions
145666.666666666667
Wamazon
NULL
145923.076923076923

The department column contains some NULLs.

This is how the different grouping sets are represented. This result table is a combination of rows grouped with different grouping sets, some on both (company, department), and some on just company. The way you can tell them apart is with the NULL values in those grouping columns. The rows where department is NULL are the rows grouped by just company, and the rows where department has a value are grouped by (company, department).

We could've written this query another way by UNIONing 2 different queries: 1 that does GROUP BY company and 1 that does GROUP BY company, department. This would be equivalent:

SELECT
  company,
  NULL AS department,
  avg(salary) AS average_salary
FROM employees
GROUP BY company
UNION
SELECT
  company,
  department,
  avg(salary) AS average_salary
FROM employees
GROUP BY company, department
ORDER BY company, department;
company
VARCHAR
department
VARCHAR
average_salary
NUMERIC
Footbook
Integration
157200.000000000000
Footbook
Research
145000.000000000000
Footbook
Solutions
163666.666666666667
Footbook
NULL
154750.000000000000
Poogle
Functionality
187333.333333333333
Poogle
Infrastructure
167000.000000000000
Poogle
Marketing
215000.000000000000
Poogle
NULL
186461.538461538462
Wamazon
Marketing
145500.000000000000
Wamazon
Quality
146750.000000000000
Wamazon
Solutions
145666.666666666667
Wamazon
NULL
145923.076923076923

Empty Grouping Set

We can add another level to this: the average salary for all users overall. Usually when aggregating for a table overall, we just omit the GROUP BY clause altogether. However, remember that in Implicit Aggregation, an empty list can be given to specify that the whole table is 1 group.

SELECT
  company,
  department,
  avg(salary) AS average_salary
FROM employees
GROUP BY GROUPING SETS (
  (),
  (company),
  (company, department)
)
ORDER BY company NULLS FIRST, department NULLS FIRST;
company
VARCHAR
department
VARCHAR
average_salary
NUMERIC
NULLNULL
162578.947368421053
Footbook
NULL
154750.000000000000
Footbook
Integration
157200.000000000000
Footbook
Research
145000.000000000000
Footbook
Solutions
163666.666666666667
Poogle
NULL
186461.538461538462
Poogle
Functionality
187333.333333333333
Poogle
Infrastructure
167000.000000000000
Poogle
Marketing
215000.000000000000
Wamazon
NULL
145923.076923076923
Wamazon
Marketing
145500.000000000000
Wamazon
Quality
146750.000000000000
Wamazon
Solutions
145666.666666666667

The row where both company and department are NULL is the row that contains the average salary for that group, all employees for all companies.

By the way, NULL normally get's sorted last. Here, we switched it so that NULL gets sorted first by adding NULLS FIRST in the ORDER BY clause.

ROLLUP, CUBE

There are specific GROUPING SETS patterns which are common, like the one in the last query. This query would be equivalent:

SELECT
  company,
  department,
  avg(salary) AS average_salary
FROM employees
GROUP BY ROLLUP(company, department)
ORDER BY company NULLS FIRST, department NULLS FIRST;
company
VARCHAR
department
VARCHAR
average_salary
NUMERIC
NULLNULL
162578.947368421053
Footbook
NULL
154750.000000000000
Footbook
Integration
157200.000000000000
Footbook
Research
145000.000000000000
Footbook
Solutions
163666.666666666667
Poogle
NULL
186461.538461538462
Poogle
Functionality
187333.333333333333
Poogle
Infrastructure
167000.000000000000
Poogle
Marketing
215000.000000000000
Wamazon
NULL
145923.076923076923
Wamazon
Marketing
145500.000000000000
Wamazon
Quality
146750.000000000000
Wamazon
Solutions
145666.666666666667

The GROUP BY ROLLUP(company, department) is equivalent to the following:

GROUP BY GROUPING SETS (
  (),
  (company),
  (company, department)
)

There is another shortcut CUBE. The clause GROUP BY CUBE(company, department) is equivalent to:

GROUP BY GROUPING SETS (
  (),
  (company),
  (department),
  (company, department)
)

CUBE takes all the possible subsets of the set of columns, also known as a Power Set.

© 2022 Andrew Carlson. All rights reserved.