JOIN LATERAL
We're still working at our blogging website "Tedium." Suppose we want to get each blogger's name and their 2 most recent posts. We did something very similar in the previous section. We joined all posts within 7 days of each blogger's sign-up date.
SELECT *
FROM bloggers b
LEFT JOIN blog_posts p
ON p.blogger_id = b.id
AND p.created_at < b.registered_at + CAST('7 days' AS INTERVAL);
We were able to join blog_posts
, but do so conditionally with an extra condition in the ON
clause.
The key here is that we are able to compute the comparison p.created_at < b.registered_at + CAST('7 days' AS INTERVAL);
.
However, for this problem, what comparison is there to make?
If we want the 2 most recent posts, there are no values to give to a <
comparison to make that happen.
We can usually get the 2 most recent of something with ORDER BY created_at DESC LIMIT 2
.
However, that won't work on a per-blogger basis here.
Therefore, even though it sounds like a similar problem, it will take a different approach.
Let's re-think to a strategy like this: for each blogger,
query their posts from the posts
table and limit to the most recent 2 with ORDER BY created_at DESC LIMIT 2
.
Then, put together the results from each subquery into a new table and join it to the bloggers
table.
We have seen that correlated subqueries can re-execute a query per row, so it should be able to implement our strategy.
However, we have only seen correlated subqueries in SELECT
or WHERE
clauses, where value expressions are expected.
These can only produce single values and cannot help us to produce a table.
We can do this with a new join type: JOIN LATERAL
.
A LATERAL
join is very different from the other joins because it takes a correlated subquery.
Although we've learned you can put a subquery in a normal JOIN
clause, it cannot be correlated, as that will fail with an error.
SELECT
name,
p.title,
p.created_at
FROM bloggers b
JOIN (
SELECT *
FROM blog_posts
WHERE blogger_id = b.id
ORDER BY created_at DESC
LIMIT 2
) p ON p.blogger_id = b.id;
error: invalid reference to FROM-clause entry for table "b"
Turning the JOIN
into a JOIN LATERAL
makes the subquery allowed to be correlated by referencing the outer query.
SELECT
name,
email,
p.title,
p.created_at
FROM bloggers b
JOIN LATERAL (
SELECT *
FROM blog_posts
WHERE blogger_id = b.id
ORDER BY created_at DESC
LIMIT 2
) p ON p.blogger_id = b.id;
name VARCHAR | email VARCHAR | title TEXT | created_at TIMESTAMPTZ |
---|---|---|---|
Audrey | Audrey50@yahoo.com | Compressing the Array | 2021-12-30 02:30:06.324+00 |
Audrey | Audrey50@yahoo.com | Hacking the Array | 2021-12-27 16:26:35.365+00 |
Kaela | Kaela.Schuster47@hotmail.com | Compressing the Hard drive | 2022-02-03 11:10:23.377+00 |
Kaela | Kaela.Schuster47@hotmail.com | Parsing the Driver | 2022-01-27 00:00:09.162+00 |
Rowan | Rowan_Bogisich40@hotmail.com | Transmitting the Alarm | 2021-12-27 04:10:34.384+00 |
Nyasia | Nyasia58@gmail.com | Backing up the Card | 2021-12-01 01:22:03.367+00 |
Nyasia | Nyasia58@gmail.com | Quantifying the Bandwidth | 2021-11-18 12:18:23.797+00 |
Stephania | Stephania92@gmail.com | Navigating the Driver | 2021-12-20 05:12:19.318+00 |
Stephania | Stephania92@gmail.com | Synthesizing the Sensor | 2021-12-07 10:30:22.827+00 |
Samara | Samara_Yundt97@gmail.com | Synthesizing the Panel | 2021-10-03 10:46:04.369+00 |
Samara | Samara_Yundt97@gmail.com | Navigating the Bandwidth | 2021-09-27 07:36:37.139+00 |
Laura | Laura_Homenick70@gmail.com | Bypassing the Hard drive | 2022-02-03 03:17:01.04+00 |
Laura | Laura_Homenick70@gmail.com | Generating the Port | 2022-01-20 06:36:41.672+00 |
Jimmie | Jimmie_Weissnat@hotmail.com | Synthesizing the Transmitter | 2022-01-23 14:24:44.294+00 |
Jimmie | Jimmie_Weissnat@hotmail.com | Programming the Array | 2022-01-23 11:04:16.184+00 |
Rosie | Rosie.Crist@yahoo.com | Parsing the Circuit | 2021-10-01 02:56:06.753+00 |
Rosie | Rosie.Crist@yahoo.com | Copying the Bus | 2021-09-29 16:25:00.288+00 |
Ben | Ben.Cummerata94@hotmail.com | Indexing the Firewall | 2021-11-23 14:22:29.048+00 |
Ben | Ben.Cummerata94@hotmail.com | Indexing the Alarm | 2021-11-14 13:06:36.368+00 |
Just like with correlated subqueries, which evaluate once per row, you can think of the lateral join subquery as running once per row of the left-hand table. The result of each will get combined into 1 table, which then becomes the right-hand table to be joined to the left.
Non-standard Versions
In standard SQL, the proper keywords are JOIN LATERAL
.
This is how it is used in Postgres.
However, several databases, such as Microsoft SQL and Oracle, use the keywords CROSS APPLY
instead.
Efficiency
For similar reasons to correlated subqueries, lateral joins can be very computationally expensive. Be wary when using them for performance-critical applications. Only use them when the number of rows in the left-hand table is small, or if absolutely necessary. Re-write queries another way if possible.