DJ Patil, U.S. Chief Data Scientist
Healthcare
Measurement and Evaluation
Policing
Education
Borrowed from Joshua Bloomenstock
Above all else, show the data
outlook
temp.
humidity
windy
skip class
Let's say I want to predict if a student will come to class...
outlook
temp.
humidity
windy
skip class
Let's say I want to predict if a student will come to class...
outcome
outlook
temp.
humidity
windy
skip class
Let's say I want to predict if a student will come to class...
outcome
attributes or features
each row is an instance
outlook
temp.
humidity
windy
skip class
Write 3 rules to classify observations as skipping/attending class
(if FEATURE(s) is VALUE, OUTCOME is VALUE)
outcome
attributes or features
Node tests an attribute
Terminal node (leaf) assigns a classification
pick attributes that produce the most "pure" branches
repeat....
repeat....
# One of many libraries for classification / ML
library(rpart)
# Read in data
homes <- read.csv('part_1_data.csv')
# Use rpart to fit a model: predict `in_sf` using all variables
basic_fit <- rpart(in_sf ~ ., data = homes, method="class")