Fun with Machine Learning
Predicting the 2018 World Cup
Sorin Peste
Microsoft
An interdisciplinary field which unifies statistics, data analysis and their related methods in order to understand and analyze actual phenomena with data.
-- anon.
Data Science
Data Science
The Data Science Process
You need a Sharp Question TM
Given a match between <Team1> and <Team2>, what is the expected goal differential at the end of the match?
http://aka.ms/PredictTheWorldCup
The Language
- Open Source, released 1993
- Statistical features
- linear modelling
- statistical testing
- time series analysis
- classification
- clustering
- etc
- Graphical features
- CRAN package repository
- RStudio IDE
R for Data Science
http://r4ds.had.co.nz/
Data Wrangling with R
> install.packages("dplyr")
> install.packages("tidyr")
Data Visualization with R
> install.packages("ggplot2")
Statistics with R
> library("stats")
http://aka.ms/PredictTheWorldCup
Your Data Needs Cleaning
Top Problems
https://www.kaggle.com/surveys/2017
Missing Data
Incorrect Data
source: dilbert.com
Irrelevant Data
Outliers
Lag Features
last10games_w_per = (number of wins in the past 10 games) / 10
last10games_d_per = (number of draws in the past 10 games) / 10
last10games_l_per = (number of losses in the past 10 games) / 10
last10games_gd_per = (goals scored - goals conceded in the past 10 games)/10
Rolling aggregates for performance metrics
Choosing ML Models
Decision Trees
Random Forests
Training, Testing, Validation
Training, Testing, Validation
Evaluation Metrics
Simulating the Matches
And... Results!
Fun with Machine Learning
By Sorin Peşte
Fun with Machine Learning
- 1,467