PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Many-to-Many Relationships in SQL

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

February 10, 2020

Many-to-Many Relationships
 

Lab: SQL Part 3

Many-to-Many Relationships

Basic JOIN

1

2

3

C

A

B

INNER JOIN

=

2

3

A

B

1

2

3

C

A

B

LEFT JOIN

=

2

3

A

B

1

Basic JOIN

1

2

3

C

A

B

RIGHT JOIN

=

2

3

A

B

1

2

3

C

A

B

<<FULL>> JOIN

=

2

3

A

B

1

C

C

CROSS JOIN

Day
Jan 01
Jan 02
...
Jan 30
Jan 31
Individual
Ind 1
Ind 2
Ind 3

CROSS JOIN

=

Day Individual
Jan 01 Ind 1
Jan 01 Ind 2
Jan 01 Ind 3
Jan 02 Ind 1
Jan 02 Ind 2
Jan 02 Ind 3
... ...
Jan 31 Ind 1
Jan 31 Ind 2
Jan 31 Ind 3

Theta JOIN

Day
Jan 01
Jan 02
...
Jan 30
Jan 31
Individual Int_day
Ind 1 Jan 10
Ind 2 Jan 11
Ind 3 Jan 12

JOIN

=

Day Individual
Jan 10 Ind 1
Jan 11 Ind 1
Jan 11 Ind 2
Jan 12 Ind 1
Jan 12 Ind 2
Jan 12 Ind 3
... ...
Jan 31 Ind 1
Jan 31 Ind 2
Jan 31 Ind 3

ON Day>=Int_day

UNION

Day
Jan 01
Jan 02
...
Jan 30
Jan 31
Day
Feb 1
Feb 2
Feb 3
Jan 01

UNION

=

Day
Jan 1
Jan 2
Jan 3
...
Jan 31
Feb 1
Feb 2
Feb 3

UNION ALL

Day
Jan 01
Jan 02
...
Jan 30
Jan 31
Day
Feb 1
Feb 2
Feb 3
Jan 01

UNION ALL

=

Day
Jan 1
Jan 2
Jan 3
...
Jan 31
Feb 1
Feb 2
Feb 3
Jan 01

Some Useful String Functions

LEFT(string, number of characters)
RIGHT(string, number of characters)
LENGTH(string)
TRIM(leading, 'A' FROM string)
TRIM(trailing, 'A' FROM string)
TRIM(both, 'A' FROM string)
POSITION('A' IN string)
STRPOS(string,'A')
SUBSTR(string, starting position, number of characters)
CONCAT(string1,string2)

Useful Functions for Time Varaibles

SELECT date,
       (SUBSTR(date, 7, 4) || '-' || LEFT(date, 2) ||
        '-' || SUBSTR(date, 4, 2))::DATE AS cleaned_date
  FROM tab

'02/03/2017'->2017-02-03

SELECT cleaned_date,
       EXTRACT(YEAR   FROM cleaned_date) AS year,
       EXTRACT(MONTH  FROM cleaned_date) AS month,
       EXTRACT(DAY    FROM cleaned_date) AS day,
       EXTRACT(HOUR   FROM cleaned_date) AS hour,
       EXTRACT(MINUTE FROM cleaned_date) AS minute,
       EXTRACT(SECOND FROM cleaned_date) AS second,
       EXTRACT(DECADE FROM cleaned_date) AS decade,
       EXTRACT(DOW    FROM cleaned_date) AS day_of_week
  FROM tab
SELECT cleaned_date,
       DATE_TRUNC(YEAR   , cleaned_date) AS year,
       DATE_TRUNC(MONTH  , cleaned_date) AS month,
       DATE_TRUNC(WEEK   , cleaned_date) AS week,
       DATE_TRUNC(DAY    , cleaned_date) AS day,
       DATE_TRUNC(HOUR   , cleaned_date) AS hour,
       DATE_TRUNC(MINUTE , cleaned_date) AS minute,
       DATE_TRUNC(SECOND , cleaned_date) AS second,
       DATE_TRUNC(DECADE , cleaned_date) AS decade
  FROM tab

COALESCE

SELECT incidnt_num,
       descript,
       COALESCE(descript, 'No Description')
  FROM tutorial.sf_crime_incidents_cleandate
 ORDER BY descript DESC

Replace NULL with other value

Subqueries

SELECT customerNumber,
       checkNumber,
       amount
FROM payments
WHERE amount = (
                 SELECT MAX(amount) 
                 FROM payments
                );
SELECT
 MAX(items),
 MIN(items),
 FLOOR(AVG(items))
FROM
 (
 SELECT
 orderNumber,
 COUNT(orderNumber) AS items
 FROM
 orderdetails
 GROUP BY
 orderNumber
 ) AS lineitems;

SQL Part 3

git pull

PHC7065-Spring2020-Lecture5

By Hui Hu

PHC7065-Spring2020-Lecture5

Slides for Lecture 5, Spring 2020, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 798