Databases allow us to...
Each row represents an "object" (e.g., person, country)
Each column provides information describing the "object"
Name | City | State | Zipcode |
---|---|---|---|
Syracuse | Syracuse | NY | 13244 |
UC Berkeley | Berkeley | CA | 94720 |
Arizona State | Tempe | AZ | 85001 |
Columbia | New York | NY | 10025 |
Id |
---|
1 |
2 |
3 |
4 |
Zip | Snow |
---|---|
13244 | 100 |
94720 | 1 |
85001 | .5 |
04989 | 56 |
Degree_Id | College_Id |
---|---|
1 | 1 |
2 | 1 |
1 | 2 |
2 | 3 |
1 | 4 |
M.A.
M.S.
M.A.
M.S.
M.A.
Structured Query Language (SQL) is how we communicate with databases in order to (1) create, (2) update, (3) retrieve, and (4) delete data.
We retrieve data from a database using a query. A query is a written statement that uses key words to communicate with the database.
SELECT
FROM
WHERE
JOIN
# Level - intro to databases - joins
# 10/10/2018
# load packages
library(tidyverse)
library(RSQLite)
library(DBI)
library(knitr)
# create database on data science programs
source("create_db.R")
# let's take a look at our institutions table
tbl(ds_programs, "institutions")
## Exercise 1: Review how to select variables from a table ----
# use sql to build a query to select name and zipcode from the institutions table
dbGetQuery(conn = ds_programs,
"SELECT name, zipcode
FROM institutions")
# use dplyr to build a query to select name and zipcode from the institutions table
ds_programs %>%
tbl("institutions") %>%
select(name, zipcode)
## Exercise 2: Learn how to join two tables using a query ----
# use sql to select name and zipcode from the institutions table and to left_join all data from the weather table
dbGetQuery(conn = ds_programs,
"SELECT institutions.name, institutions.zipcode, weather.snow
FROM institutions
LEFT JOIN weather ON institutions.zipcode = weather.zip")
# use dplyr to select name and zipcode from the institutions table and to left_join all data from the weather table
ds_programs %>%
tbl("institutions") %>%
select(name, zipcode) %>%
left_join(
ds_programs %>%
tbl("weather") %>%
select(zip, snow),
by = c("zipcode" = "zip")
)
# close database connection to ds_programs database
dbDisconnect(ds_programs)