PARTITION BY

So far, our windows have always consisted of the whole table. This doesn't have to be the case.

Let's go back to our Olympics shot put data. The rankings we generated in the last lesson have a problem. It would make more sense to have separate rankings by sex. So, we want to modify our window so that each row only sees other rows of the same sex, instead of the whole table. For this, we can add PARTITION BY to our window definition.

SELECT
  rank() OVER (PARTITION BY athlete_sex ORDER BY distance_m DESC),
  *
FROM shot_puts
ORDER BY athlete_sex, 1;
rank
BIGINT
id
INTEGER
athlete_id
INTEGER
athlete_sex
VARCHAR
distance_m
NUMERIC
technique
VARCHAR
1
92196
5933
female
19.273
glide
2
38980
8852
female
18.485
glide
3
92127
3749
female
18.316
glide
4
84190
1121
female
17.721
glide
5
50726
8639
female
17.553
glide
1
93514
7186
male
21.268
glide
2
20546
5519
male
20.208
spin
3
73198
1791
male
19.876
glide
4
58542
4958
male
18.491
spin
5
85215
7900
male
16.229
glide

Now there are 2 totally independent rankings, as the window is partitioned by athlete_sex. Each row's window only includes other rows where the value of athlete_sex matches its own. Without PARTITION BY, as in the previous lesson, the entire table is considered 1 big partition.

Revisiting an Old Problem

When we first learned about aggregation, we analyzed employee data to select the top salary at each company. However, we ran into a problem where we couldn't get any other info about the employee with the top salary, only their salary.

We ended up solving this later when we learned about subqueries. We accomplished this by joining in a second query after aggregating.

Here, we'll look at an even cleaner way to solve this using window functions. The key is that we can rank the rows in employees and use our new PARTITION BY trick.

SELECT
  rank() OVER (PARTITION BY company ORDER BY salary DESC),
  *
FROM employees;
rank
BIGINT
id
INTEGER
name
VARCHAR
company
VARCHAR
department
VARCHAR
job_title
VARCHAR
salary
INTEGER
start_date
DATE
1
38
Ronald Graham
Footbook
Integration
Human Markets Orchestrator
254000
2014-08-13
1
32
Wilson Maggio
Footbook
Solutions
Principal Intranet Architect
254000
2011-05-12
3
29
Sean Wintheiser II
Footbook
Research
Legacy Configuration Director
226000
2017-08-24
4
33
Santiago Lockman
Footbook
Solutions
Direct Branding Developer
195000
2013-12-15
5
30
Mrs. Carmen Blick
Footbook
Research
Customer Tactics Consultant
183000
2011-07-23
6
36
Whitney Goyette
Footbook
Integration
Corporate Program Analyst
176000
2016-12-10
7
27
Mathew Harber
Footbook
Research
Investor Brand Associate
149000
2017-02-13
8
35
Laura Raynor
Footbook
Integration
Central Usability Technician
144000
2019-05-19
9
34
Rex Treutel
Footbook
Integration
Human Division Representative
117000
2012-08-04
10
37
Gina DuBuque
Footbook
Integration
Dynamic Paradigm Liaison
95000
2017-02-23
11
31
Frankie Glover
Footbook
Solutions
Dynamic Solutions Assistant
42000
2015-07-18
12
28
Toni Fahey
Footbook
Research
National Branding Director
22000
2017-12-06
1
10
Marianne Schamberger
Poogle
Marketing
Future Infrastructure Assistant
300000
2017-03-27
2
7
Cameron Jacobson
Poogle
Infrastructure
Internal Mobility Orchestrator
299000
2016-07-14
3
12
Jan Wiza
Poogle
Marketing
Investor Factors Engineer
293000
2018-09-20
4
3
Tami Glover
Poogle
Functionality
Principal Marketing Architect
262000
2017-10-12
5
1
Claude Barton
Poogle
Functionality
Legacy Paradigm Executive
245000
2013-09-30
6
6
Norma Krajcik
Poogle
Infrastructure
Global Quality Analyst
214000
2010-01-16
7
4
Ana Koepp
Poogle
Infrastructure
Global Factors Assistant
194000
2011-09-27
8
13
Alberto Monahan
Poogle
Marketing
Central Response Director
162000
2017-02-10
9
5
Mae Leannon
Poogle
Infrastructure
Principal Usability Agent
139000
2018-01-08
10
8
Ruby Connelly
Poogle
Infrastructure
Corporate Division Liaison
136000
2013-08-19
11
11
Shirley Prohaska
Poogle
Marketing
Future Configuration Supervisor
105000
2010-10-11
12
2
Christy McLaughlin
Poogle
Functionality
Direct Web Coordinator
55000
2017-04-18
13
9
Antonio Treutel
Poogle
Infrastructure
Investor Quality Assistant
20000
2016-08-31
1
24
Ollie Murphy
Wamazon
Quality
Dynamic Program Director
277000
2018-08-02
2
20
Essie Legros
Wamazon
Solutions
Regional Metrics Analyst
243000
2011-02-20
3
17
Eula Price
Wamazon
Marketing
Chief Identity Developer
186000
2018-11-09
4
22
Jessie Hodkiewicz
Wamazon
Solutions
Future Assurance Producer
167000
2019-03-24
5
23
Rosemary Hilll
Wamazon
Quality
Lead Markets Director
157000
2014-09-25
6
18
Michele Larson
Wamazon
Marketing
Central Group Manager
151000
2016-07-07
6
19
Candice Konopelski
Wamazon
Marketing
Human Usability Orchestrator
151000
2015-03-19
8
15
Erik Vandervort
Wamazon
Marketing
Forward Applications Analyst
136000
2017-02-14
9
14
Alberto Dicki
Wamazon
Marketing
Central Division Technician
131000
2018-09-26
10
16
Billie Vandervort
Wamazon
Marketing
Senior Data Developer
118000
2010-01-03
11
26
Dana Ernser
Wamazon
Quality
Chief Program Producer
93000
2012-10-22
12
25
Ramon Rohan
Wamazon
Quality
International Applications Consultant
60000
2010-11-07
13
21
Sheri Labadie
Wamazon
Solutions
Dynamic Quality Planner
27000
2016-06-13

After that query, we can do another that filters out the rows that aren't the top rank.

WITH ranked_employees AS (
  SELECT
    rank() OVER (PARTITION BY company ORDER BY salary DESC),
    *
  FROM employees
)
SELECT *
FROM ranked_employees
WHERE rank = 1;
rank
BIGINT
id
INTEGER
name
VARCHAR
company
VARCHAR
department
VARCHAR
job_title
VARCHAR
salary
INTEGER
start_date
DATE
1
38
Ronald Graham
Footbook
Integration
Human Markets Orchestrator
254000
2014-08-13
1
32
Wilson Maggio
Footbook
Solutions
Principal Intranet Architect
254000
2011-05-12
1
10
Marianne Schamberger
Poogle
Marketing
Future Infrastructure Assistant
300000
2017-03-27
1
24
Ollie Murphy
Wamazon
Quality
Dynamic Program Director
277000
2018-08-02

The benefit here is that we get all the other column information about the employee having the top salary, and we don't have to do a JOIN, which is arguably more confusing than this approach.

Even More Practice

If you'd like, try going back and solving the Latest Tweets without a LATERAL join.

Solution using window function.

It can be solved using a window function like this.

WITH joined_tweets AS (
  SELECT
    u.*,
    t.body AS recent_tweet,
    t.created_at AS tweet_created_at,
    row_number() OVER (PARTITION BY t.user_id ORDER BY t.created_at DESC) AS num
  FROM follows f
  JOIN tweeters u
    ON u.id = f.followed_id
  JOIN tweets t
    ON t.user_id = u.id
  WHERE f.follower_id = 15
)
SELECT *
FROM joined_tweets
WHERE num <= 2
ORDER BY tweet_created_at DESC;

© 2022 Andrew Carlson. All rights reserved.