PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE
Basic Structured Query Language
Hui Hu Ph.D.
Department of Epidemiology
College of Public Health and Health Professions & College of Medicine
January 22, 2018
Basic SQL
Lab: SQL Part 1
Basic SQL
Relational Databases
- Relational databases model data by storing rows and columns in tables
- The power of the relational database lies in its ability to efficiently retrieve data from those tables
- in particular where there are multiple tables and relationships between those tables involved in the query
Terminology
- Database: contains many tables
- Relation (or table): contains tuples and attributes
- Tuple (or row): a set of fields that generally represents an object like a person or a music track
- Attribute (or column, field): one of possibly many elements of data corresponding to the object represented by the row
Attribute
Tuple
Relation
- A relation is defined as a set of tuples that have the same attributes
- A tuple usually represents an object and information about the object
- Objects are typically physical objects or concepts
- A relation is usually described as a table, which is organized into rows and columns
- All the data referenced by an attribute are in the same domain and conform to the same constraints
SQL
- Structured Query Language: the language used to issue commands to the database
-
Create a table
-
Retrieve some data
-
Update data
- Delete data
Two Roles in Large Projects
- Researcher / Developer: conduct scientific research or build the logic for applications
- Database Administrator: monitors and adjusts the database as the program runs in production
- Often both people participate in the building of the "Data model"
Large Project Structure
Application Software
Developer
Database Data Server
Database Tools
Database Administrator
SQL
SQL
Data Analysis Structure
Python Programs
Researcher
Database File
SQL
SQL
Input Files
OutputFiles
Database Model
- A database model or database schema is the structure or format of a database, described in a formal language supported by the database management system
- In other worlds, it is the application of a data model when used in conjunction with a database management system
Common Database Systems
- Oracle: large, commercial, enterprise-scale
- MySQL: simpler but very fast and scalable, commercial open source
- SqlServer: Microsoft
- Many other open source projects: SQLite, PostgreSQL
SELECT and FROM
- Two required ingredients in any SQL query:
- SELECT
- FROM
- They have to be in that order
- SELECT indicates which columns you'd like to view
- FROM identifies the table that they live in
SELECT and FROM
LIMIT
- Why limit results?
- simply be to see what a particular table looks like
- Use the LIMIT command
WHERE
- WHERE: limit the results in all columns to rows that satisfy the condition
- Use the WHERE command
Comparison Operators
Arithmetic in SQL
Logical Operators
- LIKE: allows you to match similar values, instead of exact values
- IN: allows you to specify a list of values you'd like to include
- BETWEEN: allows you to select only rows within a certain range
- IS NULL: allows you to slect rows that contain no data ina given column
- AND: allows you to select only rows that satisfy two conditions
- OR: allows you to select rows that satisfy either of two conditions
- NOT: allows you to select rows that do not match a certain condition
LIKE
- _: used to substitute or an individual character
- %: used to represent any character or set of characters
- ILIKE: ignore case when you're matching values
In MySQL, LIKE is case insensitive, and there is no ILIKE
IN
- Allows you to specify a list of values that you'd like to include in the results
BETWEEN
- Allows you to select only rows that are within a specific range
- Has to be paired with the AND operator
IS NULL
- Allows you to exclude rows with missing data from your results
AND
- Allows you to select only rows that satisfy two conditions
OR
- Allows you to select rows that satisfy either of two conditions
NOT
- Can be put before any conditional statement to select rows for which that statement is false
ORDER BY
- Allows you to reorder your results based on the data in one or more columns
SQL LIMIT
- Why limit results?
- simply be to see what a particular table looks like
- Use the LIMIT command
SQL Part 1
git pull
PHC7065-Spring2018-Lecture2
By Hui Hu
PHC7065-Spring2018-Lecture2
Slides for Lecture 2, Spring 2018, PHC7065 Critical Skills in Data Manipulation for Population Science
- 580