AutoML

Automatic Machine Learning

Challenge & Lessons

http://automl.chalearn.org

Machine Learning
Eureka, that's THE solution!

But what's behind the magic?

  • Data selection

  • Data cleaning/augmentation

  • Others pre-processing

  • Features engineering

  • Models selection

  • Hyperparameters optimisation

 

And quite a bit of time trying/failling
until reaching an "acceptable" solution 

Still the best idea?

The ultimate 
goal

Training

Data

Trained model

AutoML box

Query

on

New Data

Reality

Training

Data

Trained model

Data Scientist

Query

on

New Data

The Vision

Training

Data

Trained model

Crowd intelligence

AutoML box

Query

on

New Data

Chalearn AutoML Challenge

  • 6 Rounds
  • 5 data sets / round
  • 2 phases / round : AutoML & Tweakathon
  • Increasing difficulties

Data

  • 30 data sets (5 per rounds) 
  • Various domaines: pharma, medicine, marketing, finance...
  • Divers formats: text, image, video, speech...
  • Participants don't know about the domain nor the format
  • Given: dense or sparse matrix
  • Numerical, categorical, binary
  • Missing values or not
  • Noisy or not
  • Various proportion of 
  • Large test sets, ensuring statistical significance
N samples / N features
Nsamples/NfeaturesN samples / N features

Tasks

  • Binary classification
  • Multi-class classification (10 to 100's)
  • Multi-labels classification
  • Regression/Prediction
  • Difficulty = Medium to hard, 10 to 20% error at best
  • Time budget = Limited
  • Computational resources & memory = Fixed

Who are the best (so far) ?

Text

Review of best teams' approches

Frank Hutter and collaborators
from University of Freiburg

 

  • Bayesian Optimization
  • Auto-Weka
  • Auto-SKlearn

Bayesian Optimization with RF

 

SMAC: Sequential Model-Based Algorithm Configuration

repeat

 

 

 


until time budget exhausted

construct RF model to predict performance
use that model to select promising configurations
compare each selected configuration against the best known

Bayesian Optimization with RF

 

Bayesian Hyperparameter Optimizers

 

Hyperparameter optimization library: automl.org/hpolib
 

Benchmarks

  • From 2-dimensional continuous hyperparameter spaces
  • To structured ones with 768 hyperparameters

Optimizers

  • SMAC [Hutter et al, '11] , based on random forests
  • Spearmint [Snoek et al, '12] , based on Gaussian processes
  • TPE [Bergstra et al, '11] , based on 1-d distributions of good values

Results

  • GP-based Spearmint is best for low-dimensional & continuous
  • RF-based SMAC is best for high-dim, categorical & conditional

Auto WEKA

Feature selection

  • Search method: which feature subsets to evaluate
  • Evaluation method: how to evaluate feature subsets in search
  • Both methods have subparameters

 

In total: 768 parameters, 10^47 configurations

Auto Sklearn

The AutoWEKA approach applied to scikit-lean

 

Improvements

  • Meta-learning to warmstart Bayesian optimization
  • Automated posthoc ensemble construction to combine the models Bayesian optimization evaluated

Auto Sklearn

Scikit-learn [Pedregosa et al, 2011-current]
instead of WEKA [Witten et al, 1999-current]

 

  • 15 classifiers, with a total of 59 hyperparameters
  • 13 feature preprocessors, 42 hyperparams
  • 4 data preprocessors, 5 hyperparams

 

 

110 hyperpameters vs. 768 in Auto-WEKA

Auto Sklearn

  • Separately Meta Learning & Ensembling helps
  • Applied  together they prooved to be complementary

        Meta Learning provide better models earlier

        => Ensembling can start being helpful earlier

 

Meta Learning & Ensembling

Auto Sklearn

  • Trivial to use, "scikit-learn like"

 

 

 

 

  • Availabe online: https://github.com/automl/auto-sklearn
  • Good overall results, even if not necessary the best on each data set
  •  Perform better on small to medium-sized datasets

Intel's Team

with Eugene Tuv

 

  • Scalable Ensemble Learning with stochastic feature boosting

{Code not released}

James Robert Lloyd

University of Cambridge (now at Qlearsite)

 

Sensible allocation of computation for
ensemble construction for multi class classification

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Make use of the partial information gained during the training of a machine learning model in order to decide wether to:

  • pause training and start a new model
  • continue training of current model
  • resume the training of a previously-considered model

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm:

  • Anytime interruptible base learning algorithms
  • Evaluation of base learners
  • A learning curve model
  • A model of learning curve asymptotes
  • A method for deciding which algorithm to explore further

- infinite mixture of exp decays GP

- Standard smooth GP

- Entropy search

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm

  • Base learning algorithms
  • Evaluation of base learners
  • A learning curve extrapolator
  • An ensembling method 
  • A model mapping individual algorithm performance to ensemble performance 

- Most of scikit-learn

A time pressured hack Decision trees

- Cross validation

– Mixture of exponential decays GP

– Stacking

Lessons

  • Time management 
  • Memory management
  • Run pilote algo on reduced size data

AutoML strategies

  • Bayesian approache for Hyper Parameters (HP) optimization
  • Global approches including in search space: 

                HP, models, features engineering, data pre-processing

  • Ensemble methods
  • Meta-learning
  • Memory & time management

Why would you participate ?

Price money $30 000

But also 3 Nvidia Titan X

Learning by doing

Fame

Test/Evaluate/Compare your skills and new tricks

Get a new job

Fun Game

Workshops to meet with others smart and cool humans

NIPS 2015

Build parts of the dream

Various Motivations

 

  • Price Money
  • Learning
  • Test/ Evaluate / Compare your skills
  • Get a new Job
  • Play/Fun
  • Workshops IRL
  • Build parts of the dream

AutoML Challenge

Hackathon team
Marc Boullé
Lukasz Romaszco
Sébastian Treger
Emilia Vaajoensuu
Philippe Vandermersch

 

Software development
Eric Carmichael
Ivan Judson
Christophe Poulain
Percy Liang
Arthur Pesah
Xavier Baro Solé
Lukasz Romaszco
Michael Zyskowski



 

Codalab management
Evelyne Viegas
Percy Liang
Erick Watson

Advisors and beta testers
Kristin Bennett
Marc Boullé
Cecile Germain
Cecile Capponi
Richard Caruana
Gavin Cawley
Gideon Dror
Sergio Escalera
Tin Kam Ho

Balasz Kégl
Hugo Larochelle
Víctor Ponce López
Nuria Macia

Simon Mercer
Florin Popescu
Michèle Sebag
Danny Silver

Many thanks to Isabelle Guyon and all contributors

Data providers
Yindalon Aphinyanaphongs
Olivier Chapelle
Hugo Jair Escalante
Sergio Escalera
Zainab Iftikhar Malhi
Vincent Lemaire
Chih Jen Lin
Meysam Madani
Bisakha Ray
Mehreen Saeed
Alexander Statnikov
Gustavo Stolovitzky
H-J. Thiesen
Ioannis Tsamardinos

Thanks for your attention

http://automl.chalearn.org

Further details

Sébastien Treguer
@ST4Good

Contact

Participation

http://codalab.org/AutoML