What's new in TPOT

(Tree-based Pipeline Optimization Tool)

Trang Lê

@trang1618

Clean data

Select features

 

 

 

 

 

 

 

 

Preprocess features

Construct features

Select classifier

Optimize parameters

Validate model

Raw data

Automate











Typical pipeline

Automate

 

 

 

 

 

 

 

Open source AutoML tools

Randy Olson

Ryan J. Urbanowicz

Peter C. Andrews Nicole A. Lavender

La Creis Kidd

Jason H. Moore

  • DEAP





     
  • Objective:
    • maximize pipeline's cross-validation classification/regression performance
    • minimize pipeline's complexity
  • Pareto front with NSGA-II

TPOT

Weixuan Fu

Entire data set

Entire data set

PCA

Polynomial features

Combine features

Select k best features

Logistic regression

Multiple copies of the data set can enter the pipeline for analysis

Pipeline operators modify the features

Modified data set flows through the pipeline operators

Final classification is performed on the final feature set

an Example "individual"

GP primitive Dataset selector, Feature selector & preprocessor, Supervised classifier/regressor

Individual Sequence of pipeline operators

Population

Generations

Mutation and crossover

(a) insertion mutation

(b) deletion mutation

(c) swap mutation

(d) substitution mutation

(e) crossover

primitives/operators

  • feature selectors

  • feature preprocessors

  • supervised classifiers/regressors

  • feature set selectors (FSS)

parameters

  • generations

  • population_size

  • offspring_size

  • mutation_rate

  • ...

  • ...

  • template

Template + feature set Selector

Future works

  • extend to select 2+ subsets
  • re-formulate complexity
    • each operator: flexibility (# parameters?), runtime
    • number of features used in pipeline
    • access over-fitting
  • batch sampling, layered TPOT...
  • ideas...
  • FeatureSetSelector**
  •  

Jason Moore

Weixuan Fu

Thanks!

What's new in TPOT

By Trang Le

What's new in TPOT

Presentation on 2019-07-01, Moore lab Lunch&Learn

  • 717