What's new in TPOT

(Tree-based Pipeline Optimization Tool)

Trang Lê

@trang1618

0000-0003-3737-6565

doi.org/10.1093/bioinformatics/btz470

Clean data

Select features

Preprocess features

Construct features

Select classifier

Optimize parameters

Validate model

Raw data

Automate

Typical pipeline

Automate

Open source AutoML tools

Gijsbers et al. 2019

Randy Olson

Ryan J. Urbanowicz

Peter C. Andrews Nicole A. Lavender

La Creis Kidd

Jason H. Moore

DEAP
Objective:
- maximize pipeline's cross-validation classification/regression performance
- minimize pipeline's complexity
Pareto front with NSGA-II

TPOT

Weixuan Fu

Entire data set

Olson & Moore, 2016

Entire data set

PCA

Polynomial features

Combine features

Select k best features

Logistic regression

Multiple copies of the data set can enter the pipeline for analysis

Pipeline operators modify the features

Modified data set flows through the pipeline operators

Final classification is performed on the final feature set

an Example "individual"

GP primitive Dataset selector, Feature selector & preprocessor, Supervised classifier/regressor

Individual Sequence of pipeline operators

Population

Generations

Mutation and crossover

(a) insertion mutation

(b) deletion mutation

~~(c) swap mutation~~

(d) substitution mutation

(e) crossover

Orzechowski et al. 2018

https://epistasislab.github.io/tpot

primitives/operators

feature selectors
feature preprocessors
supervised classifiers/regressors
feature set selectors (FSS)

parameters

generations
population_size
offspring_size
mutation_rate
...
...
template

Template + feature set Selector

Future works

extend to select 2+ subsets
re-formulate complexity
- each operator: flexibility (# parameters?), runtime
- number of features used in pipeline
- access over-fitting
batch sampling, layered TPOT...
ideas...
FeatureSetSelector**

Jason Moore

Weixuan Fu

https://epistasislab.github.io/tpot/

https://slides.com/trang1618/whats-new-in-tpot

Thanks!

What's new in TPOT

By Trang Le

What's new in TPOT

Presentation on 2019-07-01, Moore lab Lunch&Learn

Trang Le

#math graduate. Postdoc fellow with Jason Moore.

What's new in TPOT

(Tree-based Pipeline Optimization Tool)

Typical pipeline

Open source AutoML tools

TPOT

an Example "individual"

Mutation and crossover

primitives/operators

parameters

Template + feature set Selector

Future works

Thanks!

What's new in TPOT

More from Trang Le