Happy     Day!!!

TPOT: A Tree-based Pipeline Optimization Tool

Trang Lê


postdoctoral researcher

@UPenn IBI

amateur runner


Clean data

Select features









Preprocess features

Construct features

Select classifier

Optimize parameters

Validate model

Raw data


Typical ML pipeline









Open source AutoML tools

  • auto-sklearn (python) github.com/automl/auto-sklearn
    • Bayesian optimzation over a fixed 3-step ML pipeline
  • auto-Weka (java) github.com/automl/autoweka
    • similar to auto-sklearn, built on top of Weka
  • H20.ai (java w/ python, scala, R, web GUI) github.com/h2oai/h2o-3
    • basic data prep w/ grid/random search over ML algorithms
  • devol (python) github.com/joeddav/devol
    • deep learning architecture search via GP

Randy Olson

  • DEAP

  • Objective:
    • maximize pipeline's CV classification performance
    • minimize pipeline’s complexity 
  • Pareto front with NSGA-II


Weixuan Fu

Entire data set

Entire data set


Polynomial features

Combine features

Select k best features

Logistic regression

Multiple copies of the data set can enter the pipeline for analysis

Pipeline operators modify the features

Modified data set flows through the pipeline operators

Final classification is performed on the final feature set

Genetic programming

GP primitives Dataset selectors, Feature selectors & preprocessors, Supervised classifiers

Population sequences of pipeline operators


Mutation and crossover

(a) insertion mutation

(b) deletion mutation

(c) swap mutation

(d) substitution mutation

(e) crossover

TPOT configs

  • Default TPOT
  • TPOT light
  • TPOT sparse
  • TPOT-MDR (Multi-Directional Reduction)
  • Classification
  • Regression

Template + Dataset Selector

Mutation restriction

Complexity reformulation

  • Number of pipeline operators
    • Flexibility of each operator
    • Runtime
  • Number of features used in pipeline
  • Number of parameters
  • By accessing over-fitting: stability of the covariance of predictors, rank differences of importance metrics 

Integration with neural nets


  • preprocessing
  • scalability
  • computational expense

Live demo!

Jason Moore

Weixuan Fu

TPOT: An Overview

By Trang Le

TPOT: An Overview

Presentation on 2019-03-14 at IBI Clinical Research Informatics Core

  • 1,188