AutoML

Automatic Machine Learning

Challenge & Lessons

http://automl.chalearn.org

Machine Learning
Eureka, that's THE solution!

But what's behind the magic?

Data selection
Data cleaning/augmentation
Others pre-processing
Features engineering
Models selection
Hyperparameters optimisation

And quite a bit of time trying/failling
until reaching an "acceptable" solution

Still the best idea?

The ultimate
goal

Training

Data

Trained model

AutoML box

Query

New Data

Reality

Training

Data

Trained model

Data Scientist

Query

New Data

The Vision

Training

Data

Trained model

Crowd intelligence

AutoML box

Query

New Data

Chalearn AutoML Challenge

6 Rounds
5 data sets / round
2 phases / round : AutoML & Tweakathon
Increasing difficulties

Data

30 data sets (5 per rounds)
Various domaines: pharma, medicine, marketing, finance...
Divers formats: text, image, video, speech...
Participants don't know about the domain nor the format
Given: dense or sparse matrix
Numerical, categorical, binary
Missing values or not
Noisy or not
Various proportion of
Large test sets, ensuring statistical significance

N samples / N features

N samples / N features

Tasks

Binary classification
Multi-class classification (10 to 100's)
Multi-labels classification
Regression/Prediction
Difficulty = Medium to hard, 10 to 20% error at best
Time budget = Limited
Computational resources & memory = Fixed

Who are the best (so far) ?

Text

Review of best teams' approches

Frank Hutter and collaborators
from University of Freiburg

Bayesian Optimization
Auto-Weka
Auto-SKlearn

Bayesian Optimization with RF

SMAC: Sequential Model-Based Algorithm Configuration

repeat

until time budget exhausted

construct RF model to predict performance
use that model to select promising configurations
compare each selected configuration against the best known

Bayesian Optimization with RF

Bayesian Hyperparameter Optimizers

Hyperparameter optimization library: automl.org/hpolib

Benchmarks

From 2-dimensional continuous hyperparameter spaces
To structured ones with 768 hyperparameters

Optimizers

SMAC [Hutter et al, '11] , based on random forests
Spearmint [Snoek et al, '12] , based on Gaussian processes
TPE [Bergstra et al, '11] , based on 1-d distributions of good values

Results

GP-based Spearmint is best for low-dimensional & continuous
RF-based SMAC is best for high-dim, categorical & conditional

Auto WEKA

Feature selection

Search method: which feature subsets to evaluate
Evaluation method: how to evaluate feature subsets in search
Both methods have subparameters

In total: 768 parameters, 10^47 configurations

Auto Sklearn

The AutoWEKA approach applied to scikit-lean

Improvements

Meta-learning to warmstart Bayesian optimization
Automated posthoc ensemble construction to combine the models Bayesian optimization evaluated

Auto Sklearn

Scikit-learn [Pedregosa et al, 2011-current]
instead of WEKA [Witten et al, 1999-current]

15 classifiers, with a total of 59 hyperparameters
13 feature preprocessors, 42 hyperparams
4 data preprocessors, 5 hyperparams

110 hyperpameters vs. 768 in Auto-WEKA

Auto Sklearn

Separately Meta Learning & Ensembling helps
Applied together they prooved to be complementary

Meta Learning provide better models earlier

=> Ensembling can start being helpful earlier

Meta Learning & Ensembling

Auto Sklearn

Trivial to use, "scikit-learn like"

Availabe online: https://github.com/automl/auto-sklearn
Good overall results, even if not necessary the best on each data set
Perform better on small to medium-sized datasets

Intel's Team

with Eugene Tuv

Scalable Ensemble Learning with stochastic feature boosting

{Code not released}

James Robert Lloyd

University of Cambridge (now at Qlearsite)

Sensible allocation of computation for
ensemble construction for multi class classification

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Make use of the partial information gained during the training of a machine learning model in order to decide wether to:

pause training and start a new model

continue training of current model

resume the training of a previously-considered model

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm:

Anytime interruptible base learning algorithms
Evaluation of base learners
A learning curve model
A model of learning curve asymptotes
A method for deciding which algorithm to explore further

- infinite mixture of exp decays GP

- Standard smooth GP

- Entropy search

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm

Base learning algorithms
Evaluation of base learners
A learning curve extrapolator
An ensembling method
A model mapping individual algorithm performance to ensemble performance

- Most of scikit-learn

– ~~A time pressured hack~~ Decision trees

- Cross validation

– Mixture of exponential decays GP

– Stacking

Lessons

Time management
Memory management
Run pilote algo on reduced size data

AutoML strategies

Bayesian approache for Hyper Parameters (HP) optimization
Global approches including in search space:

HP, models, features engineering, data pre-processing

Ensemble methods
Meta-learning
Memory & time management

Why would you participate ?

Price money $30 000

But also 3 Nvidia Titan X

Learning by doing

Fame

Test/Evaluate/Compare your skills and new tricks

Get a new job

Fun Game

Workshops to meet with others smart and cool humans

NIPS 2015

Build parts of the dream

Various Motivations

Price Money
Learning
Test/ Evaluate / Compare your skills
Get a new Job
Play/Fun
Workshops IRL
Build parts of the dream

AutoML Challenge

Hackathon team
Marc Boullé
Lukasz Romaszco
Sébastian Treger
Emilia Vaajoensuu
Philippe Vandermersch

Software development
Eric Carmichael
Ivan Judson
Christophe Poulain
Percy Liang
Arthur Pesah
Xavier Baro Solé
Lukasz Romaszco
Michael Zyskowski

Codalab management
Evelyne Viegas
Percy Liang
Erick Watson

Advisors and beta testers
Kristin Bennett
Marc Boullé
Cecile Germain
Cecile Capponi
Richard Caruana
Gavin Cawley
Gideon Dror
Sergio Escalera
Tin Kam Ho
Balasz Kégl
Hugo Larochelle
Víctor Ponce López
Nuria Macia

Simon Mercer
Florin Popescu
Michèle Sebag
Danny Silver

Many thanks to Isabelle Guyon and all contributors

Data providers
Yindalon Aphinyanaphongs
Olivier Chapelle
Hugo Jair Escalante
Sergio Escalera
Zainab Iftikhar Malhi
Vincent Lemaire
Chih Jen Lin
Meysam Madani
Bisakha Ray
Mehreen Saeed
Alexander Statnikov
Gustavo Stolovitzky
H-J. Thiesen
Ioannis Tsamardinos

Thanks for your attention

http://automl.chalearn.org

Further details

Sébastien Treguer
@ST4Good

Contact

Participation

http://codalab.org/AutoML

AutoML

Machine Learning Eureka, that's THE solution!

But what's behind the magic?

Still the best idea?

The ultimate goal

Reality

The Vision

Chalearn AutoML Challenge

Data

Tasks

Who are the best (so far) ?

Review of best teams' approches

Frank Hutter and collaborators from University of Freiburg

Bayesian Optimization with RF

Bayesian Optimization with RF

Bayesian Hyperparameter Optimizers

Auto WEKA

Auto Sklearn

Auto Sklearn

Auto Sklearn

Meta Learning & Ensembling

Auto Sklearn

Intel's Team

with Eugene Tuv

{Code not released}

James Robert Lloyd

University of Cambridge (now at Qlearsite)

Lessons

AutoML strategies

Why would you participate ?

Price money $30 000

Learning by doing

Fame

Test/Evaluate/Compare your skills and new tricks

Get a new job

Fun Game

Workshops to meet with others smart and cool humans

Build parts of the dream

Various Motivations

AutoML Challenge

Thanks for your attention

Machine Learning
Eureka, that's THE solution!

The ultimate
goal

Frank Hutter and collaborators
from University of Freiburg