PARTITION BY
So far, our windows have always consisted of the whole table. This doesn't have to be the case.
Let's go back to our Olympics shot put data.
The rankings we generated in the last lesson have a problem.
It would make more sense to have separate rankings by sex.
So, we want to modify our window so that each row only sees other rows of the same sex, instead of the whole table.
For this, we can add PARTITION BY
to our window definition.
SELECT
rank() OVER (PARTITION BY athlete_sex ORDER BY distance_m DESC),
*
FROM shot_puts
ORDER BY athlete_sex, 1;
rank BIGINT | id INTEGER | athlete_id INTEGER | athlete_sex VARCHAR | distance_m NUMERIC | technique VARCHAR |
---|---|---|---|---|---|
1 | 92196 | 5933 | female | 19.273 | glide |
2 | 38980 | 8852 | female | 18.485 | glide |
3 | 92127 | 3749 | female | 18.316 | glide |
4 | 84190 | 1121 | female | 17.721 | glide |
5 | 50726 | 8639 | female | 17.553 | glide |
1 | 93514 | 7186 | male | 21.268 | glide |
2 | 20546 | 5519 | male | 20.208 | spin |
3 | 73198 | 1791 | male | 19.876 | glide |
4 | 58542 | 4958 | male | 18.491 | spin |
5 | 85215 | 7900 | male | 16.229 | glide |
Now there are 2 totally independent rankings, as the window is partitioned by athlete_sex
.
Each row's window only includes other rows where the value of athlete_sex
matches its own.
Without PARTITION BY
, as in the previous lesson, the entire table is considered 1 big partition.
Revisiting an Old Problem
When we first learned about aggregation, we analyzed employee data to select the top salary at each company. However, we ran into a problem where we couldn't get any other info about the employee with the top salary, only their salary.
We ended up solving this later when we learned about subqueries. We accomplished this by joining in a second query after aggregating.
Here, we'll look at an even cleaner way to solve this using window functions.
The key is that we can rank the rows in employees
and use our new PARTITION BY
trick.
SELECT
rank() OVER (PARTITION BY company ORDER BY salary DESC),
*
FROM employees;
rank BIGINT | id INTEGER | name VARCHAR | company VARCHAR | department VARCHAR | job_title VARCHAR | salary INTEGER | start_date DATE |
---|---|---|---|---|---|---|---|
1 | 38 | Ronald Graham | Footbook | Integration | Human Markets Orchestrator | 254000 | 2014-08-13 |
1 | 32 | Wilson Maggio | Footbook | Solutions | Principal Intranet Architect | 254000 | 2011-05-12 |
3 | 29 | Sean Wintheiser II | Footbook | Research | Legacy Configuration Director | 226000 | 2017-08-24 |
4 | 33 | Santiago Lockman | Footbook | Solutions | Direct Branding Developer | 195000 | 2013-12-15 |
5 | 30 | Mrs. Carmen Blick | Footbook | Research | Customer Tactics Consultant | 183000 | 2011-07-23 |
6 | 36 | Whitney Goyette | Footbook | Integration | Corporate Program Analyst | 176000 | 2016-12-10 |
7 | 27 | Mathew Harber | Footbook | Research | Investor Brand Associate | 149000 | 2017-02-13 |
8 | 35 | Laura Raynor | Footbook | Integration | Central Usability Technician | 144000 | 2019-05-19 |
9 | 34 | Rex Treutel | Footbook | Integration | Human Division Representative | 117000 | 2012-08-04 |
10 | 37 | Gina DuBuque | Footbook | Integration | Dynamic Paradigm Liaison | 95000 | 2017-02-23 |
11 | 31 | Frankie Glover | Footbook | Solutions | Dynamic Solutions Assistant | 42000 | 2015-07-18 |
12 | 28 | Toni Fahey | Footbook | Research | National Branding Director | 22000 | 2017-12-06 |
1 | 10 | Marianne Schamberger | Poogle | Marketing | Future Infrastructure Assistant | 300000 | 2017-03-27 |
2 | 7 | Cameron Jacobson | Poogle | Infrastructure | Internal Mobility Orchestrator | 299000 | 2016-07-14 |
3 | 12 | Jan Wiza | Poogle | Marketing | Investor Factors Engineer | 293000 | 2018-09-20 |
4 | 3 | Tami Glover | Poogle | Functionality | Principal Marketing Architect | 262000 | 2017-10-12 |
5 | 1 | Claude Barton | Poogle | Functionality | Legacy Paradigm Executive | 245000 | 2013-09-30 |
6 | 6 | Norma Krajcik | Poogle | Infrastructure | Global Quality Analyst | 214000 | 2010-01-16 |
7 | 4 | Ana Koepp | Poogle | Infrastructure | Global Factors Assistant | 194000 | 2011-09-27 |
8 | 13 | Alberto Monahan | Poogle | Marketing | Central Response Director | 162000 | 2017-02-10 |
9 | 5 | Mae Leannon | Poogle | Infrastructure | Principal Usability Agent | 139000 | 2018-01-08 |
10 | 8 | Ruby Connelly | Poogle | Infrastructure | Corporate Division Liaison | 136000 | 2013-08-19 |
11 | 11 | Shirley Prohaska | Poogle | Marketing | Future Configuration Supervisor | 105000 | 2010-10-11 |
12 | 2 | Christy McLaughlin | Poogle | Functionality | Direct Web Coordinator | 55000 | 2017-04-18 |
13 | 9 | Antonio Treutel | Poogle | Infrastructure | Investor Quality Assistant | 20000 | 2016-08-31 |
1 | 24 | Ollie Murphy | Wamazon | Quality | Dynamic Program Director | 277000 | 2018-08-02 |
2 | 20 | Essie Legros | Wamazon | Solutions | Regional Metrics Analyst | 243000 | 2011-02-20 |
3 | 17 | Eula Price | Wamazon | Marketing | Chief Identity Developer | 186000 | 2018-11-09 |
4 | 22 | Jessie Hodkiewicz | Wamazon | Solutions | Future Assurance Producer | 167000 | 2019-03-24 |
5 | 23 | Rosemary Hilll | Wamazon | Quality | Lead Markets Director | 157000 | 2014-09-25 |
6 | 18 | Michele Larson | Wamazon | Marketing | Central Group Manager | 151000 | 2016-07-07 |
6 | 19 | Candice Konopelski | Wamazon | Marketing | Human Usability Orchestrator | 151000 | 2015-03-19 |
8 | 15 | Erik Vandervort | Wamazon | Marketing | Forward Applications Analyst | 136000 | 2017-02-14 |
9 | 14 | Alberto Dicki | Wamazon | Marketing | Central Division Technician | 131000 | 2018-09-26 |
10 | 16 | Billie Vandervort | Wamazon | Marketing | Senior Data Developer | 118000 | 2010-01-03 |
11 | 26 | Dana Ernser | Wamazon | Quality | Chief Program Producer | 93000 | 2012-10-22 |
12 | 25 | Ramon Rohan | Wamazon | Quality | International Applications Consultant | 60000 | 2010-11-07 |
13 | 21 | Sheri Labadie | Wamazon | Solutions | Dynamic Quality Planner | 27000 | 2016-06-13 |
After that query, we can do another that filters out the rows that aren't the top rank.
WITH ranked_employees AS (
SELECT
rank() OVER (PARTITION BY company ORDER BY salary DESC),
*
FROM employees
)
SELECT *
FROM ranked_employees
WHERE rank = 1;
rank BIGINT | id INTEGER | name VARCHAR | company VARCHAR | department VARCHAR | job_title VARCHAR | salary INTEGER | start_date DATE |
---|---|---|---|---|---|---|---|
1 | 38 | Ronald Graham | Footbook | Integration | Human Markets Orchestrator | 254000 | 2014-08-13 |
1 | 32 | Wilson Maggio | Footbook | Solutions | Principal Intranet Architect | 254000 | 2011-05-12 |
1 | 10 | Marianne Schamberger | Poogle | Marketing | Future Infrastructure Assistant | 300000 | 2017-03-27 |
1 | 24 | Ollie Murphy | Wamazon | Quality | Dynamic Program Director | 277000 | 2018-08-02 |
The benefit here is that we get all the other column information about the employee having the top salary,
and we don't have to do a JOIN
, which is arguably more confusing than this approach.
Even More Practice
If you'd like, try going back and solving the Latest Tweets
without a LATERAL
join.
Solution using window function.
It can be solved using a window function like this.
WITH joined_tweets AS (
SELECT
u.*,
t.body AS recent_tweet,
t.created_at AS tweet_created_at,
row_number() OVER (PARTITION BY t.user_id ORDER BY t.created_at DESC) AS num
FROM follows f
JOIN tweeters u
ON u.id = f.followed_id
JOIN tweets t
ON t.user_id = u.id
WHERE f.follower_id = 15
)
SELECT *
FROM joined_tweets
WHERE num <= 2
ORDER BY tweet_created_at DESC;