DISTINCT
DISTINCT
is another very useful keyword.
Although it is a separate and independent keyword from GROUP BY
,
its function is related to GROUP BY
which is why we discuss it here.
Let's look at an example.
We're still working at devils.fyi, a website similar to levels.fyi.
The marketing team wants to know which companies we have salary data on.
To give them a accurate list, we need all the unique values, no duplicates.
GROUP BY
is a fine way to do this.
SELECT company
FROM employees
GROUP BY company;
company VARCHAR |
---|
Footbook |
Wamazon |
Poogle |
That is a list of the unique values, or distinct values in the company
column.
A different way to do this is with DISTINCT
.
For getting distinct values, DISTINCT
is to GROUP BY
as a vegetable peeler is to a chef knife for peeling potatoes.
You can peel your potatoes with a chef knife if that's all you've got,
but if you have a vegetable peeler,
you're going to reach for that instead because it's more ergonomic as it's purpose-built for that specific task.
You just put DISTINCT
in the SELECT
clause.
SELECT DISTINCT company
FROM employees;
company VARCHAR |
---|
Footbook |
Wamazon |
Poogle |
DISTINCT Inside an Aggregation
DISTINCT
is a specific keyword for taking a set of values and removing the duplicates from it.
It can be used in 2 different ways.
The first is on columns like the above example.
The second way is in conjunction with aggregation functions,
where it can take the input group to an aggregate function and remove the duplicates from it.
Let's look at an example of this other usage.
The marketing team just wants a short tagline for a marketing site instead of a full list of company names.
They want the number of distinct companies.
We know about the count()
function, but that normally counts duplicates.
However, luckily we can put DISTINCT
inside the call to count()
to remove duplicates from the company name collection.
SELECT count(DISTINCT company)
FROM employees;
count BIGINT |
---|
3 |
When we do this, the group of company names has the duplicates removed from the group before it gets passed into count()
.
Hm… only 3 companies so far. We're an early startup. Maybe the marketing department can exaggerate that number a bit for the website.