PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Data Models and Relational Database
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
February 3, 2020
Relational SQL
Lab: SQL Part 2
Relational SQL
Database Design
- The goal is to avoid mistakes and design clean and easily understood databases
 
- Usually starts with a picture


Power of Relational Database
- The relational database can be read through very quickly, even for very large amounts of data
 - all replicated data are removed and replaced with references to a single copy of each bit of data
 
- When you want to query some data, the data will be read from a number of tables linked by the foreign keys
Rules to Build a Data Model
- Draw a picture of the data objects, and then figure out how to represent the objects and their relationships
 
- Basic rule: don't put the same string (character) data in twice - use a relationship instead
 
- Model the real world: if there is one thing in the real world, there should be one copy of that thing in the database

| Subject | 
|---|
| id | 
| name | 
| gender | 
| age | 
| race/ethnicity | 
| county_id | 
| state_id | 
| County | 
|---|
| id | 
| name | 
| median household income | 
| state_id | 
| State | 
|---|
| id | 
| name | 
| policy | 
| policy start date | 
Primary Key
Logical Key
Foreign Key
JOIN
- JOIN links across several tables as part of a select operation
 
- You must tell SQL how to use the keys that make the connection between the tables using an ON clause



SELECT Subject.name, State.name as State_name, State.policy
FROM Subject
JOIN State ON Subject.state_id=State.id
;A More Complex Example



SELECT Subject.name, State.name as State_name, County.income
FROM Subject
JOIN State 
JOIN County
ON Subject.state_id=State.id
    AND Subject.state_id=County.state_id
    AND Subject.county_id=County.id
;Types of JOIN
1
2
3
C
A
B
<<FULL>> JOIN
=
2
3
A
B
1
C
1
2
3
C
A
B
INNER JOIN
=
2
3
A
B
Types of JOIN
1
2
3
C
A
B
RIGHT JOIN
=
2
3
A
B
C
1
2
3
C
A
B
LEFT JOIN
=
2
3
A
B
1
Aggregate Functions
- COUNT
- SUM
- MIN
- MAX
- AVG



SELECT COUNT(id) AS n 
FROM Subject;
SELECT COUNT(id) AS n, state_id
FROM Subject
GROUP BY state_id;
DISTINCT



SELECT COUNT(DISTINCT state_id) AS nState
FROM Subject
;
Having



SELECT MAX(income) AS maxIncome, state_id
FROM County
GROUP BY state_id
HAVING MAX(income) > 90000;
What's the difference between WHERE and HAVING?
Query Clause Order
- SELECT
- FROM
- WHERE
- GROUP BY
- HAVING
- ORDER BY
CASE
- Handles if/then logic in SQL
SELECT name,
       gender,
       CASE WHEN age BETWEEN 1 AND 4 THEN 1
       WHEN age BETWEEN 5 AND 8 THEN 2
       WHEN age > 8 THEN 3
       ELSE NULL
       END AS recodeAge
FROM Subject
;


SQL Part 2
git pull
PHC7065-Spring2020-Lecture4
By Hui Hu
PHC7065-Spring2020-Lecture4
Slides for Lecture 4, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science
- 972
 
   
   
  