SELECT * from the_hard_stuff
SubQueries
Sub queries allow you to reference its results in the main query. It's super useful, but has some drawbacks:
- Can be detrimental to performances, often there are other more suitable solutions
- Can be hard to read and reason about
SubQueries in FROM or JOIN clauses
SELECT mytable.id, faketable.name
FROM table1,
(SELECT id, name FROM table2 where table2.table1_id = table1.id)
-- Note that this could (and should for performances and readability)
-- be written as an INNER JOIN.
SELECT mytable.id, t2.name
FROM table1 t1
JOIN (SELECT name from othertable) t2 ON t2.table1_id = table1.id
-- This one is also quite useless tbh, it's just an example :)
SubQueries in WHERE clauses
SELECT *
FROM beers
WHERE beers.price > (SELECT max(price) FROM wine)
-- Or is the hip on beer that crazy?
SELECT *
FROM beers
WHERE beers.id IN (
SELECT beer_id
FROM favorite_beers f
GROUP BY f.beer_id HAVING count(*) > 10
)
SubQueries in INSERT/UPDATE/DELETE clauses
Also, you can do this if you ever need to copy/paste a table:
INSERT INTO favorite_beers (beer_id, person_id)
SELECT beers.id, fab.id
FROM beers, (SELECT id from persons where name="fabrice" LIMIT 1) fab
WHERE beers.brewery="Kernel"
INSERT INTO copy
SELECT *
FROM original
INSERT INTO copy
SELECT *
FROM original
DELETE FROM favorite_beers
WHERE beer_id IN (
select id from beers where name="Foster" OR brewery="Kronembourg"
)
-- Because come on, that's disgusting
SubQueries in UPDATE clauses
UPDATE beers b
SET b.best_client=best.person_id
FROM (
SELECT person_id, count(*)
FROM favorites f
WHERE b.id=f.beer_id
GROUP BY person_id
ORDER BY count(*) DESC
LIMIT 1
) best
WHERE b.id=best.id
WITH clause
Like a subquery but more l33t
What is the WITH clause
- The SQL WITH clause allows you to give a sub-query block a name... which can be referenced in several places within the main SQL query. - www.geeksforgeeks.org
- Can be thought of as a temporary table that's dropped once the query is completed
Why use a WITH clause instead of subqueries?
- Write once use throughout your query
- Often much easier to read than subqueries
- Helps to break down a problem into smaller steps
- Can be more efficient
SHOW ME SYNTAX!!!
INSERT INTO favourite_beers (beer_id, person_id)
-- Get Fabrice's id
WITH fabrice_id AS (
SELECT id
FROM persons
WHERE name="fabrice"
LIMIT 1
)
-- Get the ids of all beers from Kernel
, kernel_beers AS ( -- Use commas to add an additional named block
SELECT id
FROM beers
WHERE brewery="Kernel"
)
-- Cross join Fabrice's id to all kernel beers ids
SELECT k.id,
fab.id
FROM fabrice_id fab
JOIN kernel_beers k
Window functions
What is a "Window function"?
- It's not made of glass
- It's not about you Mr. Gates!
-
The PostgresSQL documentation definition:
“A window function performs a calculation across a set of table rows that are somehow related to the current row…Behind the scenes, the window function is able to access more than just the current row of the query result.”
- We can use most aggregate functions as window functions, but there are also some window specific functions
SELECT job_title,
salary,
AVG(salary) OVER (PARTITION BY job_title) AS avg_salary
FROM microsoft
Over and partition by
Defining the window
- In order to use a window function you need to define the window
- You do this using the OVER function and defining how to "group" the data with the PARTITION BY clause
SELECT country,
state,
population,
SUM(population) OVER (PARTITION BY country) AS country_population
FROM the_world
In this example we are calculating the total population of a country and showing it alongside the name and population of the states in that country.
Common mistakes: Why wouldn't this work?
SELECT country,
state,
population,
SUM(population) AS country_population
FROM the_world
GROUP BY country,
state,
population
Common mistakes: Why wouldn't this work?
SELECT country,
state,
population,
SUM(population) AS country_population
FROM the_world
GROUP BY country,
state,
population
We are grouping by the state and the population of that state so the country_population would be the same as the population of the state which is incorrect.
What about subqueries?
SELECT country,
state,
population,
c.country_population
FROM the_world w
JOIN (
SELECT country
SUM(population) AS country_population
FROM the_world
GROUP BY country
) AS c
ON w.country = c.country
There are a few issues with this including readability and additional lines of code, but there's (at least) one mistake in this code.
Can you spot it? (Answer in speaker notes)
-- NB: Specific implementations and function names vary based on SQL dialect
-- Some of these examples are silly.
SELECT country,
state,
population,
-- ROW_NUMBER() Gives the row number in the window.
-- In this example we are showing how many states remain in the country list alphabetically.
ROW_NUMBER()
OVER(PARTITION BY country ORDER BY country DESC) - 1 AS states_remaining_in_country
-- LEAD() gives a value from further down the table.
-- In this example we are getting the name of the next state in the country alphabetically
LEAD(state,1)
OVER(PARTITION BY country ORDER BY country, state) as country_next_state_alphabetically
-- LAG() is similar to LEAD, but it looks back instead of forward.
-- In this example we are doing a running total of the population in a country.
population + LAG(population,1)
OVER(PARTITION BY country ORDER BY population ASC) AS pop_running_total
FROM the_world
ORDER BY country
,state
Other Window Functions
Other useful features
Geospacial search
FullText search
JSON blobs
select * from the _hard_stuff
By fguery
select * from the _hard_stuff
- 107