mathematician
postdoctoral researcher
@UPenn IBI
amateur runner
@trang1618
Clean data
Select features
Preprocess features
Construct features
Select classifier
Optimize parameters
Validate model
Raw data
Automate
Automate
Randy Olson
Weixuan Fu
Entire data set
Entire data set
PCA
Polynomial features
Combine features
Select k best features
Logistic regression
Multiple copies of the data set can enter the pipeline for analysis
Pipeline operators modify the features
Modified data set flows through the pipeline operators
Final classification is performed on the final feature set
GP primitives Dataset selectors, Feature selectors & preprocessors, Supervised classifiers
Population sequences of pipeline operators
Generations
(a) insertion mutation
(b) deletion mutation
(c) swap mutation
(d) substitution mutation
(e) crossover
Mutation restriction
Live demo!
Jason Moore
Weixuan Fu