PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Data Models and Relational Database
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
February 3, 2020
Relational SQL
Lab: SQL Part 2
Relational SQL
Database Design
- The goal is to avoid mistakes and design clean and easily understood databases
- Usually starts with a picture
Power of Relational Database
- The relational database can be read through very quickly, even for very large amounts of data
- all replicated data are removed and replaced with references to a single copy of each bit of data
- When you want to query some data, the data will be read from a number of tables linked by the foreign keys
Rules to Build a Data Model
- Draw a picture of the data objects, and then figure out how to represent the objects and their relationships
- Basic rule: don't put the same string (character) data in twice - use a relationship instead
- Model the real world: if there is one thing in the real world, there should be one copy of that thing in the database
Subject |
---|
id |
name |
gender |
age |
race/ethnicity |
county_id |
state_id |
County |
---|
id |
name |
median household income |
state_id |
State |
---|
id |
name |
policy |
policy start date |
Primary Key
Logical Key
Foreign Key
JOIN
- JOIN links across several tables as part of a select operation
- You must tell SQL how to use the keys that make the connection between the tables using an ON clause
SELECT Subject.name, State.name as State_name, State.policy
FROM Subject
JOIN State ON Subject.state_id=State.id
;
A More Complex Example
SELECT Subject.name, State.name as State_name, County.income
FROM Subject
JOIN State
JOIN County
ON Subject.state_id=State.id
AND Subject.state_id=County.state_id
AND Subject.county_id=County.id
;
Types of JOIN
1
2
3
C
A
B
<<FULL>> JOIN
=
2
3
A
B
1
C
1
2
3
C
A
B
INNER JOIN
=
2
3
A
B
Types of JOIN
1
2
3
C
A
B
RIGHT JOIN
=
2
3
A
B
C
1
2
3
C
A
B
LEFT JOIN
=
2
3
A
B
1
Aggregate Functions
- COUNT
- SUM
- MIN
- MAX
- AVG
SELECT COUNT(id) AS n
FROM Subject;
SELECT COUNT(id) AS n, state_id
FROM Subject
GROUP BY state_id;
DISTINCT
SELECT COUNT(DISTINCT state_id) AS nState
FROM Subject
;
Having
SELECT MAX(income) AS maxIncome, state_id
FROM County
GROUP BY state_id
HAVING MAX(income) > 90000;
What's the difference between WHERE and HAVING?
Query Clause Order
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
CASE
- Handles if/then logic in SQL
SELECT name,
gender,
CASE WHEN age BETWEEN 1 AND 4 THEN 1
WHEN age BETWEEN 5 AND 8 THEN 2
WHEN age > 8 THEN 3
ELSE NULL
END AS recodeAge
FROM Subject
;
SQL Part 2
git pull
PHC7065-Spring2020-Lecture4
By Hui Hu
PHC7065-Spring2020-Lecture4
Slides for Lecture 4, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science
- 801