Automatic Machine Learning

Challenge & Lessons

Machine Learning
Eureka, that's THE solution!

But what's behind the magic?

  • Data selection

  • Data cleaning/augmentation

  • Others pre-processing

  • Features engineering

  • Models selection

  • Hyperparameters optimisation


And quite a bit of time trying/failling
until reaching an "acceptable" solution 

Still the best idea?

The ultimate 



Trained model

AutoML box



New Data




Trained model

Data Scientist



New Data

The Vision



Trained model

Crowd intelligence

AutoML box



New Data

Chalearn AutoML Challenge

  • 6 Rounds
  • 5 data sets / round
  • 2 phases / round : AutoML & Tweakathon
  • Increasing difficulties


  • 30 data sets (5 per rounds) 
  • Various domaines: pharma, medicine, marketing, finance...
  • Divers formats: text, image, video, speech...
  • Participants don't know about the domain nor the format
  • Given: dense or sparse matrix
  • Numerical, categorical, binary
  • Missing values or not
  • Noisy or not
  • Various proportion of 
  • Large test sets, ensuring statistical significance
N samples / N features
Nsamples/NfeaturesN samples / N features


  • Binary classification
  • Multi-class classification (10 to 100's)
  • Multi-labels classification
  • Regression/Prediction
  • Difficulty = Medium to hard, 10 to 20% error at best
  • Time budget = Limited
  • Computational resources & memory = Fixed

Who are the best (so far) ?


Review of best teams' approches

Frank Hutter and collaborators
from University of Freiburg


  • Bayesian Optimization
  • Auto-Weka
  • Auto-SKlearn

Bayesian Optimization with RF


SMAC: Sequential Model-Based Algorithm Configuration





until time budget exhausted

construct RF model to predict performance
use that model to select promising configurations
compare each selected configuration against the best known

Bayesian Optimization with RF


Bayesian Hyperparameter Optimizers


Hyperparameter optimization library:


  • From 2-dimensional continuous hyperparameter spaces
  • To structured ones with 768 hyperparameters


  • SMAC [Hutter et al, '11] , based on random forests
  • Spearmint [Snoek et al, '12] , based on Gaussian processes
  • TPE [Bergstra et al, '11] , based on 1-d distributions of good values


  • GP-based Spearmint is best for low-dimensional & continuous
  • RF-based SMAC is best for high-dim, categorical & conditional


Feature selection

  • Search method: which feature subsets to evaluate
  • Evaluation method: how to evaluate feature subsets in search
  • Both methods have subparameters


In total: 768 parameters, 10^47 configurations

Auto Sklearn

The AutoWEKA approach applied to scikit-lean



  • Meta-learning to warmstart Bayesian optimization
  • Automated posthoc ensemble construction to combine the models Bayesian optimization evaluated

Auto Sklearn

Scikit-learn [Pedregosa et al, 2011-current]
instead of WEKA [Witten et al, 1999-current]


  • 15 classifiers, with a total of 59 hyperparameters
  • 13 feature preprocessors, 42 hyperparams
  • 4 data preprocessors, 5 hyperparams



110 hyperpameters vs. 768 in Auto-WEKA

Auto Sklearn

  • Separately Meta Learning & Ensembling helps
  • Applied  together they prooved to be complementary

        Meta Learning provide better models earlier

        => Ensembling can start being helpful earlier


Meta Learning & Ensembling

Auto Sklearn

  • Trivial to use, "scikit-learn like"





  • Availabe online:
  • Good overall results, even if not necessary the best on each data set
  •  Perform better on small to medium-sized datasets

Intel's Team

with Eugene Tuv


  • Scalable Ensemble Learning with stochastic feature boosting

{Code not released}

James Robert Lloyd

University of Cambridge (now at Qlearsite)


Sensible allocation of computation for
ensemble construction for multi class classification

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Make use of the partial information gained during the training of a machine learning model in order to decide wether to:

  • pause training and start a new model
  • continue training of current model
  • resume the training of a previously-considered model

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm:

  • Anytime interruptible base learning algorithms
  • Evaluation of base learners
  • A learning curve model
  • A model of learning curve asymptotes
  • A method for deciding which algorithm to explore further

- infinite mixture of exp decays GP

- Standard smooth GP

- Entropy search

An extension of Freeze-Thaw Bayesian Optimization to ensemble contruction

Components of the algorithm

  • Base learning algorithms
  • Evaluation of base learners
  • A learning curve extrapolator
  • An ensembling method 
  • A model mapping individual algorithm performance to ensemble performance 

- Most of scikit-learn

A time pressured hack Decision trees

- Cross validation

– Mixture of exponential decays GP

– Stacking


  • Time management 
  • Memory management
  • Run pilote algo on reduced size data

AutoML strategies

  • Bayesian approache for Hyper Parameters (HP) optimization
  • Global approches including in search space: 

                HP, models, features engineering, data pre-processing

  • Ensemble methods
  • Meta-learning
  • Memory & time management

Why would you participate ?

Price money $30 000

But also 3 Nvidia Titan X

Learning by doing


Test/Evaluate/Compare your skills and new tricks

Get a new job

Fun Game

Workshops to meet with others smart and cool humans

NIPS 2015

Build parts of the dream

Various Motivations


  • Price Money
  • Learning
  • Test/ Evaluate / Compare your skills
  • Get a new Job
  • Play/Fun
  • Workshops IRL
  • Build parts of the dream

AutoML Challenge

Hackathon team
Marc Boullé
Lukasz Romaszco
Sébastian Treger
Emilia Vaajoensuu
Philippe Vandermersch


Software development
Eric Carmichael
Ivan Judson
Christophe Poulain
Percy Liang
Arthur Pesah
Xavier Baro Solé
Lukasz Romaszco
Michael Zyskowski


Codalab management
Evelyne Viegas
Percy Liang
Erick Watson

Advisors and beta testers
Kristin Bennett
Marc Boullé
Cecile Germain
Cecile Capponi
Richard Caruana
Gavin Cawley
Gideon Dror
Sergio Escalera
Tin Kam Ho

Balasz Kégl
Hugo Larochelle
Víctor Ponce López
Nuria Macia

Simon Mercer
Florin Popescu
Michèle Sebag
Danny Silver

Many thanks to Isabelle Guyon and all contributors

Data providers
Yindalon Aphinyanaphongs
Olivier Chapelle
Hugo Jair Escalante
Sergio Escalera
Zainab Iftikhar Malhi
Vincent Lemaire
Chih Jen Lin
Meysam Madani
Bisakha Ray
Mehreen Saeed
Alexander Statnikov
Gustavo Stolovitzky
H-J. Thiesen
Ioannis Tsamardinos

Thanks for your attention

Further details

Sébastien Treguer