PHC7065 CRITICAL SKILLS IN DATA MANIPULATION FOR POPULATION SCIENCE

Basic Structured Query Language

Hui Hu Ph.D.

Department of Epidemiology

College of Public Health and Health Professions & College of Medicine

January 28, 2019

Basic SQL

 

Lab: SQL Part 1

Basic SQL

Relational Databases

  • Relational databases model data by storing rows and columns in tables
     
  • The power of the relational database lies in its ability to efficiently retrieve data from those tables
    -  in particular where there are multiple tables and relationships between those tables involved in the query

Terminology

  • Database: contains many tables
     
  • Table (or relation): contains rows and columns
     
  • Row (or tuple): a set of fields that generally represents an object like a person or a music track
     
  • Column (or attribute, field): one of possibly many elements of data corresponding to the object represented by the row

Column

Row

Table

  • A table is defined as a set of rows that have the same columns
  • A row usually represents an object and information about the object
  • Objects are typically physical objects or concepts
  • All the data referenced by a column are in the same domain and conform to the same constraints

SQL

  • Structured Query Language: the language used to issue commands to the database
     
  • Create a table
     
  • Retrieve some data
     
  • Update data
     
  • Delete data

Common Database Systems

  • Oracle: large, commercial, enterprise-scale
     
  • MySQL: simpler but very fast and scalable, commercial open source
     
  • SqlServer: Microsoft
     
  • Many other open source projects: SQLite, PostgreSQL

SELECT and FROM

  • Two required ingredients in any SQL query:
    -  SELECT
    -  FROM
     
  • They have to be in that order
    -  SELECT indicates which columns you'd like to view
    -  FROM identifies the table that they live in

SELECT and FROM

LIMIT

  • Why limit results?
    -  simply be to see what a particular table looks like
     
  • Use the LIMIT command

WHERE

  • WHERE: limit the results in all columns to rows that satisfy the condition
     
  • Use the WHERE command

Comparison Operators

Arithmetic in SQL

Logical Operators

  • LIKE: allows you to match similar values, instead of exact values
  • IN: allows you to specify a list of values you'd like to include
  • BETWEEN: allows you to select only rows within a certain range
  • IS NULL: allows you to slect rows that contain no data ina given column
  • AND: allows you to select only rows that satisfy two conditions
  • OR: allows you to select rows that satisfy either of two conditions
  • NOT: allows you to select rows that do not match a certain condition

LIKE

  • _: used to substitute or an individual character
     
  • %: used to represent any character or set of characters
     
  • ILIKE: ignore case when you're matching values

In MySQL, LIKE is case insensitive, and there is no ILIKE

IN

  • Allows you to specify a list of values that you'd like to include in the results

BETWEEN

  • Allows you to select only rows that are within a specific range
  • Has to be paired with the AND operator

IS NULL

  • Allows you to exclude rows with missing data from your results

AND

  • Allows you to select only rows that satisfy two conditions

OR

  • Allows you to select rows that satisfy either of two conditions

NOT

  • Can be put before any conditional statement to select rows for which that statement is false

ORDER BY

  • Allows you to reorder your results based on the data in one or more columns

SQL LIMIT

  • Why limit results?
    -  simply be to see what a particular table looks like
     
  • Use the LIMIT command

SQL Part 1

git pull

PHC7065-Spring2019-Lecture3

By Hui Hu

PHC7065-Spring2019-Lecture3

Slides for Lecture 3, Spring 2019, PHC7065 Critical Skills in Data Manipulation for Population Science

  • 410
Loading comments...

More from Hui Hu