AutoML
Automatic Machine Learning
Challenge & Lessons
http://automl.chalearn.org
Machine Learning
Eureka, that's THE solution!
But what's behind the magic?
-
Data selection
-
Data cleaning/augmentation
-
Others pre-processing
-
Features engineering
-
Models selection
-
Hyperparameters optimisation
And quite a bit of time trying/failling
until reaching an "acceptable" solution
Still the best idea?
The ultimate
goal
Training
Data
Trained model
AutoML box
Query
on
New Data
Reality
Training
Data
Trained model
Data Scientist
Query
on
New Data
The Vision
Training
Data
Trained model
Crowd intelligence
AutoML box
Query
on
New Data
Chalearn AutoML Challenge
- 6 Rounds
- 5 data sets / round
- 2 phases / round : AutoML & Tweakathon
- Increasing difficulties
Data
- 30 data sets (5 per rounds)
- Various domaines: pharma, medicine, marketing, finance...
- Divers formats: text, image, video, speech...
- Participants don't know about the domain nor the format
- Given: dense or sparse matrix
- Numerical, categorical, binary
- Missing values or not
- Noisy or not
- Various proportion of
- Large test sets, ensuring statistical significance
Tasks
- Binary classification
- Multi-class classification (10 to 100's)
- Multi-labels classification
- Regression/Prediction
- Difficulty = Medium to hard, 10 to 20% error at best
- Time budget = Limited
- Computational resources & memory = Fixed
Who are the best (so far) ?
Text
Review of best teams' approches
Frank Hutter and collaborators
from University of Freiburg
- Bayesian Optimization
- Auto-Weka
- Auto-SKlearn
Bayesian Optimization with RF
SMAC: Sequential Model-Based Algorithm Configuration
repeat
until time budget exhausted
construct RF model to predict performance
use that model to select promising configurations
compare each selected configuration against the best known
Bayesian Optimization with RF
Bayesian Hyperparameter Optimizers
Hyperparameter optimization library: automl.org/hpolib
Benchmarks
- From 2-dimensional continuous hyperparameter spaces
- To structured ones with 768 hyperparameters
Optimizers
- SMAC [Hutter et al, '11] , based on random forests
- Spearmint [Snoek et al, '12] , based on Gaussian processes
- TPE [Bergstra et al, '11] , based on 1-d distributions of good values
Results
- GP-based Spearmint is best for low-dimensional & continuous
- RF-based SMAC is best for high-dim, categorical & conditional
Auto WEKA
Feature selection
- Search method: which feature subsets to evaluate
- Evaluation method: how to evaluate feature subsets in search
- Both methods have subparameters
In total: 768 parameters, 10^47 configurations
Auto Sklearn
The AutoWEKA approach applied to scikit-lean
Improvements
- Meta-learning to warmstart Bayesian optimization
- Automated posthoc ensemble construction to combine the models Bayesian optimization evaluated
Auto Sklearn
Scikit-learn [Pedregosa et al, 2011-current]
instead of WEKA [Witten et al, 1999-current]
- 15 classifiers, with a total of 59 hyperparameters
- 13 feature preprocessors, 42 hyperparams
- 4 data preprocessors, 5 hyperparams
110 hyperpameters vs. 768 in Auto-WEKA
Auto Sklearn
- Separately Meta Learning & Ensembling helps
- Applied together they prooved to be complementary
Meta Learning provide better models earlier
=> Ensembling can start being helpful earlier
Meta Learning & Ensembling
Auto Sklearn
- Trivial to use, "scikit-learn like"
- Availabe online: https://github.com/automl/auto-sklearn
- Good overall results, even if not necessary the best on each data set
- Perform better on small to medium-sized datasets
Intel's Team
with Eugene Tuv
- Scalable Ensemble Learning with stochastic feature boosting
{Code not released}
James Robert Lloyd
University of Cambridge (now at Qlearsite)
Sensible allocation of computation for
ensemble construction for multi class classification
An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction
An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction
Make use of the partial information gained during the training of a machine learning model in order to decide wether to:
- pause training and start a new model
- continue training of current model
- resume the training of a previously-considered model
An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction
Components of the algorithm:
- Anytime interruptible base learning algorithms
- Evaluation of base learners
- A learning curve model
- A model of learning curve asymptotes
- A method for deciding which algorithm to explore further
- infinite mixture of exp decays GP
- Standard smooth GP
- Entropy search
An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction
Components of the algorithm
- Base learning algorithms
- Evaluation of base learners
- A learning curve extrapolator
- An ensembling method
- A model mapping individual algorithm performance to ensemble performance
- Most of scikit-learn
– A time pressured hack Decision trees
- Cross validation
– Mixture of exponential decays GP
– Stacking
Lessons
- Time management
- Memory management
- Run pilote algo on reduced size data
AutoML strategies
- Bayesian approache for Hyper Parameters (HP) optimization
- Global approches including in search space:
HP, models, features engineering, data pre-processing
- Ensemble methods
- Meta-learning
- Memory & time management
Why would you participate ?
Price money $30 000
But also 3 Nvidia Titan X
Learning by doing
Fame
Test/Evaluate/Compare your skills and new tricks
Get a new job
Fun Game
Workshops to meet with others smart and cool humans
NIPS 2015
Build parts of the dream
Various Motivations
- Price Money
- Learning
- Test/ Evaluate / Compare your skills
- Get a new Job
- Play/Fun
- Workshops IRL
- Build parts of the dream
AutoML Challenge
Hackathon team
Marc Boullé
Lukasz Romaszco
Sébastian Treger
Emilia Vaajoensuu
Philippe Vandermersch
Software development
Eric Carmichael
Ivan Judson
Christophe Poulain
Percy Liang
Arthur Pesah
Xavier Baro Solé
Lukasz Romaszco
Michael Zyskowski
Codalab management
Evelyne Viegas
Percy Liang
Erick Watson
Advisors and beta testers
Kristin Bennett
Marc Boullé
Cecile Germain
Cecile Capponi
Richard Caruana
Gavin Cawley
Gideon Dror
Sergio Escalera
Tin Kam Ho
Balasz Kégl
Hugo Larochelle
Víctor Ponce López
Nuria Macia
Simon Mercer
Florin Popescu
Michèle Sebag
Danny Silver
Many thanks to Isabelle Guyon and all contributors
Data providers
Yindalon Aphinyanaphongs
Olivier Chapelle
Hugo Jair Escalante
Sergio Escalera
Zainab Iftikhar Malhi
Vincent Lemaire
Chih Jen Lin
Meysam Madani
Bisakha Ray
Mehreen Saeed
Alexander Statnikov
Gustavo Stolovitzky
H-J. Thiesen
Ioannis Tsamardinos
Thanks for your attention
http://automl.chalearn.org
Further details
Sébastien Treguer
@ST4Good
Contact
Participation
http://codalab.org/AutoML
AutoML
By streguer
AutoML
- 11,858