HAVING
We're still working at devils.fyi, a website similar to levels.fyi.
Now, suppose we're helping build new features for our jobs website. We want a page to showcase tech companies with especially high average salaries, lets say greater than $150,000.
Suppose we want to get companies where the average salary is more than $150,000.
We can compute avg(salary)
, so let's try to put that inside a WHERE
clause.
SELECT
company,
avg(salary) AS average_salary
FROM employees
WHERE avg(salary) > 150000
GROUP BY company;
error: aggregate functions are not allowed in WHERE
We get an error.
It says aggregate functions are not allowed in the WHERE
clause.
SQL provides a HAVING
clause, which is conceptually the same thing as a WHERE
clause to filter out rows,
but different in that is works with aggregate functions.
SELECT
company,
avg(salary) AS average_salary
FROM employees
GROUP BY company
HAVING avg(salary) > 150000;
company VARCHAR | average_salary NUMERIC |
---|---|
Footbook | 154750.000000000000 |
Poogle | 186461.538461538462 |
So why is a HAVING
clause necessary?
There is a good reason for this, but we need to understand the "order" the clauses are applied.
Notice that the order is WHERE
, GROUP BY
, then HAVING
.
The WHERE
happens before the GROUP BY
, so aggregation hasn't happened yet and aggregate functions can't be used.
HAVING
happens right after aggregation, and lets us filter on aggregate functions.
Why do we have both?
There are some situations where you want to remove rows before grouping,
and some situations where you want to remove rows after it.