PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Many-to-Many Relationships in SQL
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
February 10, 2020
Many-to-Many Relationships
Lab: SQL Part 3
Many-to-Many Relationships
Basic JOIN
1
2
3
C
A
B
INNER JOIN
=
2
3
A
B
1
2
3
C
A
B
LEFT JOIN
=
2
3
A
B
1
Basic JOIN
1
2
3
C
A
B
RIGHT JOIN
=
2
3
A
B
1
2
3
C
A
B
<<FULL>> JOIN
=
2
3
A
B
1
C
C
CROSS JOIN
Day |
---|
Jan 01 |
Jan 02 |
... |
Jan 30 |
Jan 31 |
Individual |
---|
Ind 1 |
Ind 2 |
Ind 3 |
CROSS JOIN
=
Day | Individual |
---|---|
Jan 01 | Ind 1 |
Jan 01 | Ind 2 |
Jan 01 | Ind 3 |
Jan 02 | Ind 1 |
Jan 02 | Ind 2 |
Jan 02 | Ind 3 |
... | ... |
Jan 31 | Ind 1 |
Jan 31 | Ind 2 |
Jan 31 | Ind 3 |
Theta JOIN
Day |
---|
Jan 01 |
Jan 02 |
... |
Jan 30 |
Jan 31 |
Individual | Int_day |
---|---|
Ind 1 | Jan 10 |
Ind 2 | Jan 11 |
Ind 3 | Jan 12 |
JOIN
=
Day | Individual |
---|---|
Jan 10 | Ind 1 |
Jan 11 | Ind 1 |
Jan 11 | Ind 2 |
Jan 12 | Ind 1 |
Jan 12 | Ind 2 |
Jan 12 | Ind 3 |
... | ... |
Jan 31 | Ind 1 |
Jan 31 | Ind 2 |
Jan 31 | Ind 3 |
ON Day>=Int_day
UNION
Day |
---|
Jan 01 |
Jan 02 |
... |
Jan 30 |
Jan 31 |
Day |
---|
Feb 1 |
Feb 2 |
Feb 3 |
Jan 01 |
UNION
=
Day |
---|
Jan 1 |
Jan 2 |
Jan 3 |
... |
Jan 31 |
Feb 1 |
Feb 2 |
Feb 3 |
UNION ALL
Day |
---|
Jan 01 |
Jan 02 |
... |
Jan 30 |
Jan 31 |
Day |
---|
Feb 1 |
Feb 2 |
Feb 3 |
Jan 01 |
UNION ALL
=
Day |
---|
Jan 1 |
Jan 2 |
Jan 3 |
... |
Jan 31 |
Feb 1 |
Feb 2 |
Feb 3 |
Jan 01 |
Some Useful String Functions
LEFT(string, number of characters)
RIGHT(string, number of characters)
LENGTH(string)
TRIM(leading, 'A' FROM string)
TRIM(trailing, 'A' FROM string)
TRIM(both, 'A' FROM string)
POSITION('A' IN string)
STRPOS(string,'A')
SUBSTR(string, starting position, number of characters)
CONCAT(string1,string2)
Useful Functions for Time Varaibles
SELECT date,
(SUBSTR(date, 7, 4) || '-' || LEFT(date, 2) ||
'-' || SUBSTR(date, 4, 2))::DATE AS cleaned_date
FROM tab
'02/03/2017'->2017-02-03
SELECT cleaned_date,
EXTRACT(YEAR FROM cleaned_date) AS year,
EXTRACT(MONTH FROM cleaned_date) AS month,
EXTRACT(DAY FROM cleaned_date) AS day,
EXTRACT(HOUR FROM cleaned_date) AS hour,
EXTRACT(MINUTE FROM cleaned_date) AS minute,
EXTRACT(SECOND FROM cleaned_date) AS second,
EXTRACT(DECADE FROM cleaned_date) AS decade,
EXTRACT(DOW FROM cleaned_date) AS day_of_week
FROM tab
SELECT cleaned_date,
DATE_TRUNC(YEAR , cleaned_date) AS year,
DATE_TRUNC(MONTH , cleaned_date) AS month,
DATE_TRUNC(WEEK , cleaned_date) AS week,
DATE_TRUNC(DAY , cleaned_date) AS day,
DATE_TRUNC(HOUR , cleaned_date) AS hour,
DATE_TRUNC(MINUTE , cleaned_date) AS minute,
DATE_TRUNC(SECOND , cleaned_date) AS second,
DATE_TRUNC(DECADE , cleaned_date) AS decade
FROM tab
COALESCE
SELECT incidnt_num,
descript,
COALESCE(descript, 'No Description')
FROM tutorial.sf_crime_incidents_cleandate
ORDER BY descript DESC
Replace NULL with other value
Subqueries
SELECT customerNumber,
checkNumber,
amount
FROM payments
WHERE amount = (
SELECT MAX(amount)
FROM payments
);
SELECT
MAX(items),
MIN(items),
FLOOR(AVG(items))
FROM
(
SELECT
orderNumber,
COUNT(orderNumber) AS items
FROM
orderdetails
GROUP BY
orderNumber
) AS lineitems;
SQL Part 3
git pull
PHC7065-Spring2020-Lecture5
By Hui Hu
PHC7065-Spring2020-Lecture5
Slides for Lecture 5, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science
- 798