SQL
+
Relational Databases
Peter Compernolle
What's the point of a database?
- I/O performance
- Quickly insert/update data
- Quickly retrieve data
- Scalable
- Ease of use
- Retrieve data exactly as needed
- Massage or process data on the fly
- Scalable
- Engineering costs
- Ease of accessing data
- Database maintenance
- Scalable
- Additional awesomeness
- Data integrity
- Redundancy
- Scalable
Common Solutions:
Excel
Pros:
- Easy to enter data
- Easy to sort of filter data
- Everyone can use it
Cons
- Doesn't scale
- Poor data integrity
- Single user
- Crashes all the time
- Often guesses dates wrong
- Changes strings "001" to be 1
- Gives false sense of security with password protected files
- Has a 63k row limit (≤Office 2003)
- Assumes that people don't put commas anywhere in a file except to delimit fields
- Stores all changes so files get huge with little benefit
- Inefficient backups
Common Solutions:
$key => $value
Pros:
- Crazy fast
- Scalable
- Unstructured data
Cons
- Difficult to architect
- Unstructured data
- Requires manual data replication
Common Solutions:
The Old Fashioned
Pros:
- Always with you
- No computer skills needed
- Extra awesome points!
Cons
- I can't read my own writing
- I can't find anything > 1wk old
- Succeptable to water damage
RIP, outofpages.com :(
SQL + Relational Database
- Fast
- Very easy to
- use
- develop
- maintain
- Scales reasonably well
Relational Data
users
id
first_name
last_name
active
created_at
1
Jack
Donnaghy
true
2001-03-06 12:34:56
2
Liz
Lemon
true
2001-03-08 09:43:01
3
Leo
Spaceman
false
2001-07-13 10:01:33
SELECT
users.first_name ,
users.last_name
FROM
users
WHERE
users.active = 1
ORDER BY id
ASC
LIMIT 5
Relational Data
accounts
id
user_id
balance
active
created_at
1
1
56485156.24
true
2001-03-06 12:34:56
2
1
153438.45
true
2001-03-08 09:43:01
3
2
23.92
true
2001-07-13 10:01:33
SELECT
accounts.id ,
accounts.balance
FROM
accounts
WHERE
accounts.active = 1
ORDER BY accounts.amount
DESC
SQL (es-kyü-el)
A Query's Components
-
What you want
SELECT the fields, aggregates, or other calculations
-
Pool of data
FROM, JOIN all the data required (usually far more)
-
Conditions (on data pool)
WHERE the pool of data satisfies certain conditions
-
Combinators/filters
GROUPed BY fields to combine rows returned, HAVING certain conditions applied after being grouped
-
Presentation of results
ORDERed BY field(s) and/or LIMITed to a specific number of records (starting at a position)
Pseudo-SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
What you want
User data & his/her net balance
SELECT users.name ,
SUM(accounts.balance) AS total
Pseudo-SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
Pool of data
User & account data
FROM users
LEFT JOIN account ON accounts.user_id = users.id
Pseudo-SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
Conditions on data pool
Only active users
WHERE users.active = 1
Pseudo-SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
Combinators/filters
One row per user, and only for users with a net balance less than $200
GROUP BY users.id
HAVING total < 200
Pseudo-SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
Presentation
Only show the five smallest net balances (in order)
ORDER BY total ASC
LIMIT 5
The SQL
A list of five active users & net balance, whose net balance is less than $200, with the smallest balances first
SELECT users.name ,
SUM(account.balance) AS total
FROM users
LEFT JOIN account ON accounts.user_id = users.id
WHERE users.active = 1
GROUP BY users.id
HAVING total < 200
ORDER BY total ASC
LIMIT 5
The Details
What about the details?!
- Join type
- Apply conditions to pool of data or after grouping?
- or even to a join?
- What is the main table and what is joined?
- How do I group? By what?
- What if some tables don't have related records?
- What if I want to ignore some records?
It's in the data!
SQL itself is only a handful of building blocks.
How you use them depends entirely on the data you have, and what data you want out.
So let's look at the data...
Relationships
One to One
One driver's license belongs to one driverOne to Many
One person has many licenses(but remember: one license belongs to only one driver)
Many to Many
LEFT vs INNER vs RIGHT
Join types:
-
LEFT
Records in the preceeding relevant table will be returned even if no records present in this table
-
INNER
Only records will be returned if each relevant table has a record
-
RIGHT
You're writing your query wrong. Flip table order and use LEFT
INNER
-
INNER is the default join
SELECT *
FROM a
JOIN b ON a.field = b.field -
A LEFT or RIGHT join doesn't really make sense if you have a condition on the joined table (so use INNER for clarity)
SELECT *
FROM a
LEFT JOIN b ON a.field = b.field
WHERE b.field IS NOT NULL -
Performance
If your left table is significantly larger than your right, a LEFT join will be a bit faster.
,
FROM a
JOIN b ON a .field = b .field
FROM a , b
WHERE a .field = b .field
Same, except...
- No indexes
- Faster to write
- Slower to run
DB Schema
Person
id
first_name
last_name
...
1
Jack
Donaghy
2
Liz
Lemon
License
id
person_id
number
expiration_dt
...
1
1
IL-C1234-56789
2015-05-05
One to One
Suppose we want to see every license
and its driver:
SELECT person.first_name ,
person.last_name ,
license.number
FROM license
LEFT JOIN person ON license.person_id = person.id
first_name
last_name
number
Jack
Donaghy
IL-C1234-56789
DB Schema
Person
id
first_name
last_name
...
1
Jack
Donaghy
2
Liz
Lemon
License
id
person_id
number
expiration_dt
...
1
1
IL-C1234-56789
2015-05-05
1
1
TX-C5432-10987
2015-05-05
One to One
or none
or many
Suppose we want to see everyone's name
and license number:
SELECT person.first_name ,
person.last_name ,
license.number
FROM person
LEFT JOIN license ON license.person_id = person.id
first_name
last_name
number
Jack
Donaghy
IL-C1234-56789
Liz
Lemon
NULL
Jack
Donaghy
TX-C5432-10987
What Happened?
license
person
What Happened?
license
person
SELECT ...
FROM license
LEFT JOIN person ...
What Happened?
license
person
SELECT ...
FROM person
LEFT JOIN license ...
DB Schema (part ii)
Is there anything we can learn from just looking at the schema?
Person
id
first_name
last_name
...
1
Jack
Donaghy
2
Liz
Lemon
License
id
person_id
number
expiration_dt
...
1
1
IL-C1234-56789
2015-05-05
2
1
TX-C5432-10987
2014-05-05
Eegads! References!
The relationship between licenses and people is merely
a foreign key reference on a license record pointing to a person's primary key
person.id
license.person_id
- The person table alone has no reference to licenses
- The license table may contain zero, one, or many records referencing a single person
- Apply constraints, but still be careful
Constraints
But what's the worry? Add constraints to person_id
- Set
person_id
toNOT NULL
This will assure that all licenses belong to a person - Create a
UNIQUE INDEX
onperson_id
This will assure that no person has more than one license - Use application logic to avoid putting bad data in
But!
You really should still write your queries correctly!
- What if a person really does have more than one license?
- Sometimes you may need to store a record with a null reference
Many-to-Many
(Reminder: A driver may drive many cars,
and each car can have many drivers)
Many-to-Many
=
One-to-many + One-to-many
Person
id
first_name
1
Jack
2
Liz
Pivot!
(aka xref, cross-reference table)
person_id
car_id
1
1
1
2
1
4
Car
id
name
1
T
2
The Homer
3
Vespa
4
H2
Many-to-Many
SELECT person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
first_name
model
Liz
NULL
Jack
T
Jack
The Homer
Jack
H2
Many-to-Many
SELECT person.first_name ,
car.model
FROM car
LEFT JOIN pivot ON pivot.car_id = car.id
LEFT JOIN person ON pivot.person_id = person.id
first_name
model
Jack
T
Jack
The Homer
Jack
H2
NULL
Vespa
Many-to-Many
SELECT person.first_name ,
car.model
FROM pivot
LEFT JOIN person ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
first_name
model
Jack
T
Jack
The Homer
Jack
H2
Common Mistakes: Uniqueness
Suppose what makes a record unique changes...
Pivot!
person_id
car_id
1
1
1
2
1
2
1
4
active
...
true
true
false
true
Common Mistakes: Uniqueness
SELECT
person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
first_name
model
Liz
NULL
Jack
T
Jack
The Homer
Jack
The Homer
Jack
H2
The wrong way (1):
SELECT DISTINCT
person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
first_name
model
Liz
NULL
Jack
T
Jack
The Homer
Jack
H2
The wrong way (2):
SELECT person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
GROUP BY person.id , car.model
first_name
model
Liz
NULL
Jack
T
Jack
The Homer
Jack
H2
The wrong way (3):
SELECT person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
LEFT JOIN car ON pivot.car_id = car.id
WHERE pivot.active = 1
first_name
model
Jack
T
Jack
The Homer
Jack
H2
The correct way:
SELECT person.first_name ,
car.model
FROM person
LEFT JOIN pivot ON pivot.person_id = person.id
AND pivot.active = 1
LEFT JOIN car ON pivot.car_id = car.id
first_name
model
Liz
NULL
Jack
T
Jack
The Homer
Jack
H2
So. Good queries?
SQL is simple -- based on only a handful of keywords.
But minor nuances yield drastic differences.
- Know your data!
- Break the query into small pieces
- Consider what makes data unique
- Start with the data you want returned per row, and watch row count
- Remember: Your test data set isn't always complete
Questions?
Thanks!
This presentation is available at
slid.es/petercompernolle/sql
SQL + Relational Databases
By petercompernolle
SQL + Relational Databases
- 3,421