PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Basic Structured Query Language
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 28, 2019
Basic SQL
Lab: SQL Part 1
Basic SQL
Relational Databases
- Relational databases model data by storing rows and columns in tables
- The power of the relational database lies in its ability to efficiently retrieve data from those tables
- in particular where there are multiple tables and relationships between those tables involved in the query
Terminology
- Database: contains many tables
- Table (or relation): contains rows and columns
- Row (or tuple): a set of fields that generally represents an object like a person or a music track
- Column (or attribute, field): one of possibly many elements of data corresponding to the object represented by the row
Column
Row
Table
- A table is defined as a set of rows that have the same columns
- A row usually represents an object and information about the object
- Objects are typically physical objects or concepts
- All the data referenced by a column are in the same domain and conform to the same constraints
SQL
- Structured Query Language: the language used to issue commands to the database
-
Create a table
-
Retrieve some data
-
Update data
- Delete data
Common Database Systems
- Oracle: large, commercial, enterprise-scale
- MySQL: simpler but very fast and scalable, commercial open source
- SqlServer: Microsoft
- Many other open source projects: SQLite, PostgreSQL
SELECT and FROM
- Two required ingredients in any SQL query:
- SELECT
- FROM
- They have to be in that order
- SELECT indicates which columns you'd like to view
- FROM identifies the table that they live in
SELECT and FROM
LIMIT
- Why limit results?
- simply be to see what a particular table looks like
- Use the LIMIT command
WHERE
- WHERE: limit the results in all columns to rows that satisfy the condition
- Use the WHERE command
Comparison Operators
Arithmetic in SQL
Logical Operators
- LIKE: allows you to match similar values, instead of exact values
- IN: allows you to specify a list of values you'd like to include
- BETWEEN: allows you to select only rows within a certain range
- IS NULL: allows you to slect rows that contain no data ina given column
- AND: allows you to select only rows that satisfy two conditions
- OR: allows you to select rows that satisfy either of two conditions
- NOT: allows you to select rows that do not match a certain condition
LIKE
- _: used to substitute or an individual character
- %: used to represent any character or set of characters
- ILIKE: ignore case when you're matching values
In MySQL, LIKE is case insensitive, and there is no ILIKE
IN
- Allows you to specify a list of values that you'd like to include in the results
BETWEEN
- Allows you to select only rows that are within a specific range
- Has to be paired with the AND operator
IS NULL
- Allows you to exclude rows with missing data from your results
AND
- Allows you to select only rows that satisfy two conditions
OR
- Allows you to select rows that satisfy either of two conditions
NOT
- Can be put before any conditional statement to select rows for which that statement is false
ORDER BY
- Allows you to reorder your results based on the data in one or more columns
SQL LIMIT
- Why limit results?
- simply be to see what a particular table looks like
- Use the LIMIT command
SQL Part 1
git pull
PHC7065-Spring2019-Lecture3
By Hui Hu
PHC7065-Spring2019-Lecture3
Slides for Lecture 3, Spring 2019, PHC7065 Critical Skills in Data Manipulation for Population Science
- 587