HAVING

We're still working at devils.fyi, a website similar to levels.fyi.

Now, suppose we're helping build new features for our jobs website. We want a page to showcase tech companies with especially high average salaries, lets say greater than $150,000.

Suppose we want to get companies where the average salary is more than $150,000. We can compute avg(salary), so let's try to put that inside a WHERE clause.

SELECT
  company,
  avg(salary) AS average_salary
FROM employees
WHERE avg(salary) > 150000
GROUP BY company;
error: aggregate functions are not allowed in WHERE

We get an error. It says aggregate functions are not allowed in the WHERE clause. SQL provides a HAVING clause, which is conceptually the same thing as a WHERE clause to filter out rows, but different in that is works with aggregate functions.

SELECT
  company,
  avg(salary) AS average_salary
FROM employees
GROUP BY company
HAVING avg(salary) > 150000;
company
VARCHAR
average_salary
NUMERIC
Footbook
154750.000000000000
Poogle
186461.538461538462

So why is a HAVING clause necessary? There is a good reason for this, but we need to understand the "order" the clauses are applied.

FROM, JOIN WHERE GROUP BY ORDER BY HAVING LIMIT SELECT

Notice that the order is WHERE, GROUP BY, then HAVING. The WHERE happens before the GROUP BY, so aggregation hasn't happened yet and aggregate functions can't be used. HAVING happens right after aggregation, and lets us filter on aggregate functions. Why do we have both? There are some situations where you want to remove rows before grouping, and some situations where you want to remove rows after it.

© 2022 Andrew Carlson. All rights reserved.