What's new in TPOT

(Tree-based Pipeline Optimization Tool)

Trang Lê


Clean data

Select features









Preprocess features

Construct features

Select classifier

Optimize parameters

Validate model

Raw data


Typical pipeline









Open source AutoML tools

Randy Olson

Ryan J. Urbanowicz

Peter C. Andrews Nicole A. Lavender

La Creis Kidd

Jason H. Moore

  • DEAP

  • Objective:
    • maximize pipeline's cross-validation classification/regression performance
    • minimize pipeline's complexity
  • Pareto front with NSGA-II


Weixuan Fu

Entire data set

Entire data set


Polynomial features

Combine features

Select k best features

Logistic regression

Multiple copies of the data set can enter the pipeline for analysis

Pipeline operators modify the features

Modified data set flows through the pipeline operators

Final classification is performed on the final feature set

an Example "individual"

GP primitive Dataset selector, Feature selector & preprocessor, Supervised classifier/regressor

Individual Sequence of pipeline operators



Mutation and crossover

(a) insertion mutation

(b) deletion mutation

(c) swap mutation

(d) substitution mutation

(e) crossover


  • feature selectors

  • feature preprocessors

  • supervised classifiers/regressors

  • feature set selectors (FSS)


  • generations

  • population_size

  • offspring_size

  • mutation_rate

  • ...

  • ...

  • template

Template + feature set Selector

Future works

  • extend to select 2+ subsets
  • re-formulate complexity
    • each operator: flexibility (# parameters?), runtime
    • number of features used in pipeline
    • access over-fitting
  • batch sampling, layered TPOT...
  • ideas...
  • FeatureSetSelector**

Jason Moore

Weixuan Fu


What's new in TPOT

By Trang Le

What's new in TPOT

Presentation on 2019-07-01, Moore lab Lunch&Learn

  • 534