JOIN LATERAL

We're still working at our blogging website "Tedium." Suppose we want to get each blogger's name and their 2 most recent posts. We did something very similar in the previous section. We joined all posts within 7 days of each blogger's sign-up date.

SELECT *
FROM bloggers b
LEFT JOIN blog_posts p
  ON p.blogger_id = b.id
  AND p.created_at < b.registered_at + CAST('7 days' AS INTERVAL);

We were able to join blog_posts, but do so conditionally with an extra condition in the ON clause. The key here is that we are able to compute the comparison p.created_at < b.registered_at + CAST('7 days' AS INTERVAL);. However, for this problem, what comparison is there to make? If we want the 2 most recent posts, there are no values to give to a < comparison to make that happen. We can usually get the 2 most recent of something with ORDER BY created_at DESC LIMIT 2. However, that won't work on a per-blogger basis here. Therefore, even though it sounds like a similar problem, it will take a different approach.

Let's re-think to a strategy like this: for each blogger, query their posts from the posts table and limit to the most recent 2 with ORDER BY created_at DESC LIMIT 2. Then, put together the results from each subquery into a new table and join it to the bloggers table.

We have seen that correlated subqueries can re-execute a query per row, so it should be able to implement our strategy. However, we have only seen correlated subqueries in SELECT or WHERE clauses, where value expressions are expected. These can only produce single values and cannot help us to produce a table. We can do this with a new join type: JOIN LATERAL. A LATERAL join is very different from the other joins because it takes a correlated subquery. Although we've learned you can put a subquery in a normal JOIN clause, it cannot be correlated, as that will fail with an error.

SELECT
  name,
  p.title,
  p.created_at
FROM bloggers b
JOIN (
  SELECT *
  FROM blog_posts
  WHERE blogger_id = b.id
  ORDER BY created_at DESC
  LIMIT 2
) p ON p.blogger_id = b.id;
error: invalid reference to FROM-clause entry for table "b"

Turning the JOIN into a JOIN LATERAL makes the subquery allowed to be correlated by referencing the outer query.

SELECT
  name,
  email,
  p.title,
  p.created_at
FROM bloggers b
JOIN LATERAL (
  SELECT *
  FROM blog_posts
  WHERE blogger_id = b.id
  ORDER BY created_at DESC
  LIMIT 2
) p ON p.blogger_id = b.id;
name
VARCHAR
email
VARCHAR
title
TEXT
created_at
TIMESTAMPTZ
Audrey
Audrey50@yahoo.com
Compressing the Array
2021-12-30 02:30:06.324+00
Audrey
Audrey50@yahoo.com
Hacking the Array
2021-12-27 16:26:35.365+00
Kaela
Kaela.Schuster47@hotmail.com
Compressing the Hard drive
2022-02-03 11:10:23.377+00
Kaela
Kaela.Schuster47@hotmail.com
Parsing the Driver
2022-01-27 00:00:09.162+00
Rowan
Rowan_Bogisich40@hotmail.com
Transmitting the Alarm
2021-12-27 04:10:34.384+00
Nyasia
Nyasia58@gmail.com
Backing up the Card
2021-12-01 01:22:03.367+00
Nyasia
Nyasia58@gmail.com
Quantifying the Bandwidth
2021-11-18 12:18:23.797+00
Stephania
Stephania92@gmail.com
Navigating the Driver
2021-12-20 05:12:19.318+00
Stephania
Stephania92@gmail.com
Synthesizing the Sensor
2021-12-07 10:30:22.827+00
Samara
Samara_Yundt97@gmail.com
Synthesizing the Panel
2021-10-03 10:46:04.369+00
Samara
Samara_Yundt97@gmail.com
Navigating the Bandwidth
2021-09-27 07:36:37.139+00
Laura
Laura_Homenick70@gmail.com
Bypassing the Hard drive
2022-02-03 03:17:01.04+00
Laura
Laura_Homenick70@gmail.com
Generating the Port
2022-01-20 06:36:41.672+00
Jimmie
Jimmie_Weissnat@hotmail.com
Synthesizing the Transmitter
2022-01-23 14:24:44.294+00
Jimmie
Jimmie_Weissnat@hotmail.com
Programming the Array
2022-01-23 11:04:16.184+00
Rosie
Rosie.Crist@yahoo.com
Parsing the Circuit
2021-10-01 02:56:06.753+00
Rosie
Rosie.Crist@yahoo.com
Copying the Bus
2021-09-29 16:25:00.288+00
Ben
Ben.Cummerata94@hotmail.com
Indexing the Firewall
2021-11-23 14:22:29.048+00
Ben
Ben.Cummerata94@hotmail.com
Indexing the Alarm
2021-11-14 13:06:36.368+00

Just like with correlated subqueries, which evaluate once per row, you can think of the lateral join subquery as running once per row of the left-hand table. The result of each will get combined into 1 table, which then becomes the right-hand table to be joined to the left.

Table Name: bloggers name id Aang 11 Katara 12 Table Name: blog_posts blogger_id title 11 How to Airbend 11 How to Waterbend 11 How to Earthbend 12 How to Really Waterbend 1 12 How to Really Waterbend 2 12 How to Really Waterbend 3 WHERELIMIT SELECTFROM * blog_posts blogger_id = 11 2 WHERELIMIT SELECTFROM * blog_posts blogger_id = 2 12 blogger_id title 11 How to Airbend 11 How to Waterbend 12 How to Really Waterbend 1 12 How to Really Waterbend 2 JOIN

Non-standard Versions

In standard SQL, the proper keywords are JOIN LATERAL. This is how it is used in Postgres. However, several databases, such as Microsoft SQL and Oracle, use the keywords CROSS APPLY instead.

Efficiency

For similar reasons to correlated subqueries, lateral joins can be very computationally expensive. Be wary when using them for performance-critical applications. Only use them when the number of rows in the left-hand table is small, or if absolutely necessary. Re-write queries another way if possible.

© 2022 Andrew Carlson. All rights reserved.