JOIN
So far, we have only queried single tables. Things get more interesting when there are multiple related tables. Why would tables have relationships? Let's look at a use case.
Suppose we are working on a social media website called "Tweeter",
where people can write short excerpts of texts called "tweets".
The users are in a table called tweeters
.
The tweets are in tweets
.
column_name | type |
---|---|
id | INTEGER |
username | VARCHAR |
email | VARCHAR |
registered_at | TIMESTAMPTZ |
column_name | type | reference |
---|---|---|
id | INTEGER | NULL |
body | TEXT | NULL |
user_id | INTEGER | tweeters.id |
created_at | TIMESTAMPTZ | NULL |
The tweets
are always written by a user (a tweeter).
There is a relationship between the tables because each tweet is owned by exactly one user.
That information is stored in the user_id
column.
This is equal to the id
of the user who wrote the tweet.
In other words, tweets.user_id
references tweeters.id
.
We call this relationship a foreign key.
So, the user_id
column is a foreign key to tweeters.id
.
Visually, engineers often like to depict this relationship with something like this:
This shows the actual relationship very clearly.
The forked tip at the user_id
column is meant to show that a user can write many tweets,
i.e. the relationship is "one to many".
Getting Tweets From a Tweeter
Suppose we need to get the tweets for username judy.blick60
.
It's no problem to filter the tweeters
by WHERE username = 'judy.blick60'
.
However, the tweets
table doesn't have username, but the tweeter ID.
We don't know what ID we would filter on.
So, we can 2 do separate queries, and use the ID we retrieve from the first query.
SELECT id
FROM tweeters
WHERE username = 'judy.blick60';
id INTEGER |
---|
11 |
We see that her ID is 11, then we put that into our next query…
SELECT body
FROM tweets
WHERE user_id = 11;
body TEXT |
---|
If we program the program, we can get to the THX application through the solid state USB bus! |
I'll generate the optical IB bandwidth, that should application the AGP program! |
generating the system won't do anything, we need to quantify the optical SQL firewall! |
This works, but we had to run 2 queries to get our data. This isn't always the most efficient. There is better way.
Performing a JOIN
SQL gives us the ability to get data from multiple related tables at once.
How?
The idea is to combine 2 (or more) tables side-by-side using a JOIN
clause.
Here is what it looks like in code:
SELECT *
FROM tweeters
JOIN tweets
ON tweeters.id = tweets.user_id
ORDER BY username;
id INTEGER | username VARCHAR | email VARCHAR | registered_at TIMESTAMPTZ | id INTEGER | body TEXT | user_id INTEGER | created_at TIMESTAMPTZ |
---|---|---|---|---|---|---|---|
5 | annabelle74 | annabelle_erdman83@hotmail.com | 2016-07-27 14:03:26.745+00 | 16 | If we input the monitor, we can get to the IB transmitter through the primary HDD transmitter! | 5 | 2018-07-26 05:28:38.075+00 |
6 | berry.yost15 | berry_yost19@gmail.com | 2018-04-13 13:24:45.756+00 | 18 | We need to connect the cross-platform SMTP application! | 6 | 2018-12-03 13:31:28.755+00 |
3 | brisa_jenkins39 | brisa_jenkins@hotmail.com | 2014-06-12 05:12:17.099+00 | 17 | The SCSI protocol is down, index the multi-byte sensor so we can bypass the AI microchip! | 3 | 2018-08-17 10:39:47.727+00 |
3 | brisa_jenkins39 | brisa_jenkins@hotmail.com | 2014-06-12 05:12:17.099+00 | 4 | If we override the monitor, we can get to the CSS capacitor through the wireless XSS array! | 3 | 2016-06-21 16:19:17.977+00 |
3 | brisa_jenkins39 | brisa_jenkins@hotmail.com | 2014-06-12 05:12:17.099+00 | 22 | You can't hack the bandwidth without quantifying the auxiliary SQL firewall! | 3 | 2019-12-03 07:27:54.773+00 |
4 | dax1 | dax.bode@gmail.com | 2018-07-17 11:50:09.979+00 | 28 | Try to input the SQL port, maybe it will back up the solid state bandwidth! | 4 | 2021-05-01 22:18:55.039+00 |
1 | deonte_murray45 | deonte_murray57@hotmail.com | 2021-08-02 08:54:42.061+00 | 36 | You can't input the circuit without hacking the back-end CSS matrix! | 1 | 2021-11-13 01:05:58.795+00 |
1 | deonte_murray45 | deonte_murray57@hotmail.com | 2021-08-02 08:54:42.061+00 | 34 | parsing the monitor won't do anything, we need to index the primary SMS transmitter! | 1 | 2021-11-10 01:39:21.523+00 |
1 | deonte_murray45 | deonte_murray57@hotmail.com | 2021-08-02 08:54:42.061+00 | 37 | generating the protocol won't do anything, we need to back up the virtual USB hard drive! | 1 | 2021-12-20 11:41:28.716+00 |
13 | elinore56 | elinore11@hotmail.com | 2015-12-05 12:43:36.332+00 | 10 | compressing the system won't do anything, we need to index the optical SMTP alarm! | 13 | 2017-05-24 13:28:46.409+00 |
13 | elinore56 | elinore11@hotmail.com | 2015-12-05 12:43:36.332+00 | 20 | Try to bypass the SAS feed, maybe it will bypass the 1080p application! | 13 | 2019-02-13 21:39:53.676+00 |
13 | elinore56 | elinore11@hotmail.com | 2015-12-05 12:43:36.332+00 | 2 | Try to transmit the SMTP hard drive, maybe it will input the back-end card! | 13 | 2016-05-23 05:03:29.62+00 |
13 | elinore56 | elinore11@hotmail.com | 2015-12-05 12:43:36.332+00 | 9 | I'll copy the auxiliary PCI alarm, that should card the GB firewall! | 13 | 2017-01-26 05:13:38.387+00 |
9 | enrico.durgan1 | enrico45@gmail.com | 2015-09-02 18:36:33.47+00 | 5 | We need to hack the bluetooth GB port! | 9 | 2016-07-09 13:23:46.107+00 |
9 | enrico.durgan1 | enrico45@gmail.com | 2015-09-02 18:36:33.47+00 | 7 | Use the bluetooth XML capacitor, then you can bypass the bluetooth bandwidth! | 9 | 2016-08-13 10:53:39.711+00 |
9 | enrico.durgan1 | enrico45@gmail.com | 2015-09-02 18:36:33.47+00 | 35 | I'll back up the mobile AI circuit, that should array the EXE hard drive! | 9 | 2021-11-10 23:56:02.234+00 |
9 | enrico.durgan1 | enrico45@gmail.com | 2015-09-02 18:36:33.47+00 | 6 | Use the optical AGP program, then you can connect the back-end transmitter! | 9 | 2016-07-16 14:24:41.202+00 |
7 | ethan67 | ethan_kuhn@gmail.com | 2019-05-31 06:42:38.464+00 | 23 | Try to program the SMS interface, maybe it will generate the cross-platform matrix! | 7 | 2020-07-04 21:40:52.919+00 |
7 | ethan67 | ethan_kuhn@gmail.com | 2019-05-31 06:42:38.464+00 | 25 | Use the solid state SMTP protocol, then you can index the cross-platform monitor! | 7 | 2021-01-24 12:23:05.948+00 |
2 | hilda.grant53 | hilda_grant48@hotmail.com | 2016-04-24 13:52:43.58+00 | 12 | programming the pixel won't do anything, we need to copy the cross-platform TCP monitor! | 2 | 2018-01-01 07:01:16.356+00 |
2 | hilda.grant53 | hilda_grant48@hotmail.com | 2016-04-24 13:52:43.58+00 | 29 | You can't copy the array without indexing the primary SMS card! | 2 | 2021-05-14 09:26:34.859+00 |
2 | hilda.grant53 | hilda_grant48@hotmail.com | 2016-04-24 13:52:43.58+00 | 11 | The PNG card is down, parse the virtual alarm so we can index the SQL capacitor! | 2 | 2017-08-22 16:26:06.154+00 |
2 | hilda.grant53 | hilda_grant48@hotmail.com | 2016-04-24 13:52:43.58+00 | 3 | You can't bypass the driver without bypassing the virtual HTTP hard drive! | 2 | 2016-06-11 01:12:38.474+00 |
11 | judy.blick60 | judy14@yahoo.com | 2014-02-11 02:53:02.395+00 | 1 | If we program the program, we can get to the THX application through the solid state USB bus! | 11 | 2016-02-25 22:36:30.812+00 |
11 | judy.blick60 | judy14@yahoo.com | 2014-02-11 02:53:02.395+00 | 8 | I'll generate the optical IB bandwidth, that should application the AGP program! | 11 | 2016-09-09 10:18:32.523+00 |
11 | judy.blick60 | judy14@yahoo.com | 2014-02-11 02:53:02.395+00 | 30 | generating the system won't do anything, we need to quantify the optical SQL firewall! | 11 | 2021-06-16 12:42:05.054+00 |
10 | madge.wehner | madge_wehner@yahoo.com | 2021-02-15 14:22:34.017+00 | 26 | If we override the array, we can get to the SMS panel through the online RAM interface! | 10 | 2021-03-15 23:07:23.918+00 |
10 | madge.wehner | madge_wehner@yahoo.com | 2021-02-15 14:22:34.017+00 | 32 | I'll back up the digital SMS sensor, that should port the XSS firewall! | 10 | 2021-09-15 09:22:09.36+00 |
10 | madge.wehner | madge_wehner@yahoo.com | 2021-02-15 14:22:34.017+00 | 33 | We need to back up the back-end AGP pixel! | 10 | 2021-09-23 20:10:58.765+00 |
10 | madge.wehner | madge_wehner@yahoo.com | 2021-02-15 14:22:34.017+00 | 27 | You can't index the driver without connecting the wireless FTP application! | 10 | 2021-04-26 20:48:05.334+00 |
14 | manuel_torp | manuel79@gmail.com | 2015-05-16 11:41:28.533+00 | 13 | Use the back-end PNG port, then you can index the cross-platform port! | 14 | 2018-02-21 07:24:38.871+00 |
14 | manuel_torp | manuel79@gmail.com | 2015-05-16 11:41:28.533+00 | 15 | The COM program is down, synthesize the optical alarm so we can calculate the ADP bus! | 14 | 2018-05-30 20:35:11.712+00 |
12 | marjorie.thompson | marjorie_thompson99@gmail.com | 2015-03-29 01:28:47.113+00 | 19 | If we connect the interface, we can get to the EXE capacitor through the bluetooth JBOD pixel! | 12 | 2018-12-07 16:41:13.131+00 |
8 | trevion_schuppe15 | trevion_schuppe83@gmail.com | 2017-07-21 21:15:58.491+00 | 31 | programming the port won't do anything, we need to hack the haptic SSL driver! | 8 | 2021-07-07 16:26:38.927+00 |
8 | trevion_schuppe15 | trevion_schuppe83@gmail.com | 2017-07-21 21:15:58.491+00 | 24 | backing up the capacitor won't do anything, we need to navigate the haptic PCI array! | 8 | 2020-12-14 17:53:08.401+00 |
8 | trevion_schuppe15 | trevion_schuppe83@gmail.com | 2017-07-21 21:15:58.491+00 | 21 | We need to navigate the wireless SQL panel! | 8 | 2019-11-01 00:03:20.609+00 |
8 | trevion_schuppe15 | trevion_schuppe83@gmail.com | 2017-07-21 21:15:58.491+00 | 14 | You can't synthesize the alarm without backing up the optical SSL application! | 8 | 2018-04-24 21:18:00.194+00 |
Notice that the result's columns are a fusion of both tables' columns. Here's what it look like visually.
The rows from tweeters
try to match up with any rows from tweets
.
When the rows from the tables match, they combine to form 1 combined row.
Those combined rows form the output table.
However, notice that 1 row on the left matches multiple rows on the right.
What happens in this case?
Whenever a single row matches multiple other rows, its contents are duplicated in the result. So, in the result, a user having 3 tweets will have their user information (email, username, etc.) appear 3 times.
Why is that useful?
Well now these new combined rows represent tweets and they have the tweet information and the user information.
We can therefore select the tweet information and filter on user information, like username
, in a single query!
Let's have a look at the code to make this possible.
You specify the left-hand table in FROM
, and then the right-hand table in JOIN
.
You also need to specify how the rows match up.
How do we know which rows in tweets
match which rows in tweeters
?
We know from the ON
clause after the JOIN
.
Rows will match when the expression in the ON
clause is true.
In our case, when the user_id
column in tweets
equals the id
column in tweeters
, i.e. tweeters.id = tweets.user_id
.
Notice the notation table_name.column_name
, which is new.
This is just a more specific way to refer to a column.
It wasn't necessary when only one table was involved, but it is now.
Watch what happens if we refer to the id
column without being specific…
SELECT id
FROM tweeters
JOIN tweets
ON tweeters.id = tweets.user_id;
error: column reference "id" is ambiguous
Just selecting id
is ambiguous because both tables have an id
column.
Do we mean the tweeter ID or the tweet ID?
We can prefix our column names with their respective table names to disambiguate.
Let's also add the WHERE
clause to filter to the tweets we want.
SELECT
tweets.id AS tweet_id,
tweets.body,
tweets.created_at
FROM tweeters
JOIN tweets
ON tweeters.id = tweets.user_id
WHERE tweeters.email = 'judy14@yahoo.com';
tweet_id INTEGER | body TEXT | created_at TIMESTAMPTZ |
---|---|---|
1 | If we program the program, we can get to the THX application through the solid state USB bus! | 2016-02-25 22:36:30.812+00 |
8 | I'll generate the optical IB bandwidth, that should application the AGP program! | 2016-09-09 10:18:32.523+00 |
30 | generating the system won't do anything, we need to quantify the optical SQL firewall! | 2021-06-16 12:42:05.054+00 |
Notice how we can safely request tweets.id
without error.
Furthermore, we alias it to tweet_id
instead.
Otherwise, the name in the result would just be id
.
Also notice that I use tweeters.email
and tweets.body
instead of just email
and body
.
Because these columns are unambiguous (only occur in one of the tables involved), prefixing with the table name is optional.
However, people reading your query may not know that.
It is better to be specific and add the table name unless it is very obvious which table a column belongs to.
Table Aliases
Much like aliases can be assigned to columns in the SELECT
clause,
we can assign aliases to tables in the FROM
and JOIN
clauses.
SELECT
t.id AS tweet_id,
t.body,
t.created_at
FROM tweeters u
JOIN tweets t
ON u.id = t.user_id
WHERE u.email = 'judy14@yahoo.com';
tweet_id INTEGER | body TEXT | created_at TIMESTAMPTZ |
---|---|---|
1 | If we program the program, we can get to the THX application through the solid state USB bus! | 2016-02-25 22:36:30.812+00 |
8 | I'll generate the optical IB bandwidth, that should application the AGP program! | 2016-09-09 10:18:32.523+00 |
30 | generating the system won't do anything, we need to quantify the optical SQL firewall! | 2021-06-16 12:42:05.054+00 |
When table aliases are used, the column references need to match the table aliases, not the original table name.
Table aliases never end up in the result. They are only used to disambiguate between columns or between multiple instances of the same table (we'll see exapmles of that later).
You also don't use AS
before the alias, at least according to the SQL standard.
However, Postgres lets you add AS
if you want.
I prefer to stick to the standard.
Other JOIN Types
This is just one type of join, called an "inner join".
We could've typed INNER JOIN
instead of JOIN
.
Inner joins are the default join type.
We'll look at more join types soon.
First, let's get some practice with the inner join.