GROUPING SETS
We've done aggregations on multiple levels. For example, we've looked at average salaries per company
SELECT
company,
avg(salary) AS average_salary
FROM employees
GROUP BY company;
company VARCHAR | average_salary NUMERIC |
---|---|
Footbook | 154750.000000000000 |
Wamazon | 145923.076923076923 |
Poogle | 186461.538461538462 |
And we can drill down deeper and look at departments within each company.
SELECT
company,
department,
avg(salary) AS average_salary
FROM employees
GROUP BY company, department
ORDER BY company, department;
company VARCHAR | department VARCHAR | average_salary NUMERIC |
---|---|---|
Footbook | Integration | 157200.000000000000 |
Footbook | Research | 145000.000000000000 |
Footbook | Solutions | 163666.666666666667 |
Poogle | Functionality | 187333.333333333333 |
Poogle | Infrastructure | 167000.000000000000 |
Poogle | Marketing | 215000.000000000000 |
Wamazon | Marketing | 145500.000000000000 |
Wamazon | Quality | 146750.000000000000 |
Wamazon | Solutions | 145666.666666666667 |
What if we want to see both of these in a single query?
We would need some way of specifying 2 different groupings, GROUP BY company
and GROUP BY company, department
.
That's where GROUPING SETS
comes in.
It gives us a way to do just that.
SELECT
company,
department,
avg(salary) AS average_salary
FROM employees
GROUP BY GROUPING SETS (
(company),
(company, department)
)
ORDER BY company, department;
company VARCHAR | department VARCHAR | average_salary NUMERIC |
---|---|---|
Footbook | Integration | 157200.000000000000 |
Footbook | Research | 145000.000000000000 |
Footbook | Solutions | 163666.666666666667 |
Footbook | NULL | 154750.000000000000 |
Poogle | Functionality | 187333.333333333333 |
Poogle | Infrastructure | 167000.000000000000 |
Poogle | Marketing | 215000.000000000000 |
Poogle | NULL | 186461.538461538462 |
Wamazon | Marketing | 145500.000000000000 |
Wamazon | Quality | 146750.000000000000 |
Wamazon | Solutions | 145666.666666666667 |
Wamazon | NULL | 145923.076923076923 |
The
department
column contains someNULL
s.
This is how the different grouping sets are represented.
This result table is a combination of rows grouped with different grouping sets, some on both (company, department)
,
and some on just company
.
The way you can tell them apart is with the NULL
values in those grouping columns.
The rows where department
is NULL
are the rows grouped by just company
,
and the rows where department
has a value are grouped by (company, department)
.
We could've written this query another way by UNION
ing 2 different queries:
1 that does GROUP BY company
and 1 that does GROUP BY company, department
.
This would be equivalent:
SELECT
company,
NULL AS department,
avg(salary) AS average_salary
FROM employees
GROUP BY company
UNION
SELECT
company,
department,
avg(salary) AS average_salary
FROM employees
GROUP BY company, department
ORDER BY company, department;
company VARCHAR | department VARCHAR | average_salary NUMERIC |
---|---|---|
Footbook | Integration | 157200.000000000000 |
Footbook | Research | 145000.000000000000 |
Footbook | Solutions | 163666.666666666667 |
Footbook | NULL | 154750.000000000000 |
Poogle | Functionality | 187333.333333333333 |
Poogle | Infrastructure | 167000.000000000000 |
Poogle | Marketing | 215000.000000000000 |
Poogle | NULL | 186461.538461538462 |
Wamazon | Marketing | 145500.000000000000 |
Wamazon | Quality | 146750.000000000000 |
Wamazon | Solutions | 145666.666666666667 |
Wamazon | NULL | 145923.076923076923 |
Empty Grouping Set
We can add another level to this: the average salary for all users overall.
Usually when aggregating for a table overall, we just omit the GROUP BY
clause altogether.
However, remember that in Implicit Aggregation,
an empty list can be given to specify that the whole table is 1 group.
SELECT
company,
department,
avg(salary) AS average_salary
FROM employees
GROUP BY GROUPING SETS (
(),
(company),
(company, department)
)
ORDER BY company NULLS FIRST, department NULLS FIRST;
company VARCHAR | department VARCHAR | average_salary NUMERIC |
---|---|---|
NULL | NULL | 162578.947368421053 |
Footbook | NULL | 154750.000000000000 |
Footbook | Integration | 157200.000000000000 |
Footbook | Research | 145000.000000000000 |
Footbook | Solutions | 163666.666666666667 |
Poogle | NULL | 186461.538461538462 |
Poogle | Functionality | 187333.333333333333 |
Poogle | Infrastructure | 167000.000000000000 |
Poogle | Marketing | 215000.000000000000 |
Wamazon | NULL | 145923.076923076923 |
Wamazon | Marketing | 145500.000000000000 |
Wamazon | Quality | 146750.000000000000 |
Wamazon | Solutions | 145666.666666666667 |
The row where both company
and department
are NULL
is the row that contains the average salary for that group, all employees for all companies.
By the way, NULL
normally get's sorted last.
Here, we switched it so that NULL
gets sorted first by adding NULLS FIRST
in the ORDER BY
clause.
ROLLUP, CUBE
There are specific GROUPING SETS
patterns which are common, like the one in the last query.
This query would be equivalent:
SELECT
company,
department,
avg(salary) AS average_salary
FROM employees
GROUP BY ROLLUP(company, department)
ORDER BY company NULLS FIRST, department NULLS FIRST;
company VARCHAR | department VARCHAR | average_salary NUMERIC |
---|---|---|
NULL | NULL | 162578.947368421053 |
Footbook | NULL | 154750.000000000000 |
Footbook | Integration | 157200.000000000000 |
Footbook | Research | 145000.000000000000 |
Footbook | Solutions | 163666.666666666667 |
Poogle | NULL | 186461.538461538462 |
Poogle | Functionality | 187333.333333333333 |
Poogle | Infrastructure | 167000.000000000000 |
Poogle | Marketing | 215000.000000000000 |
Wamazon | NULL | 145923.076923076923 |
Wamazon | Marketing | 145500.000000000000 |
Wamazon | Quality | 146750.000000000000 |
Wamazon | Solutions | 145666.666666666667 |
The GROUP BY ROLLUP(company, department)
is equivalent to the following:
GROUP BY GROUPING SETS (
(),
(company),
(company, department)
)
There is another shortcut CUBE
.
The clause GROUP BY CUBE(company, department)
is equivalent to:
GROUP BY GROUPING SETS (
(),
(company),
(department),
(company, department)
)
CUBE
takes all the possible subsets of the set of columns, also known as a Power Set.