Versatile models with decision trees
ADS second oral presentation
January 20, 2020
A typical ML model can solve this
Given
what's the value of ?
Disadvantage
Advantages
Given
what's the value of ?
A versatile ML model can solve this.
versatile
Bayesian network
A
B
C
A
B
C
A
B
C
Compute distribution
of C, given A and B
A
B
C
Bayesian network
A
B
C
A
B
C
Bayesian network
A
B
C
A
B
C
ADVANTAGES
DISADVANTAGES
Fully interpretable and completely versatile
Scalability: structure learning and inference are hard problems.
Works best on nominal data, handling numeric data is non-trivial.
Bayesian network
PGM
kNN
NN
Nominal + Numeric data
Interpretable
Scalable
WHAT?
WHY?
HOW?
A multi-directional ensemble of decision trees could be a versatile model.
DTs are fast, interpretable, handle nominal and numeric attributes, ensembles are trivial to parallellize, ...
A minor modification of the Random Forest paradigm: allow for randomness in the target attributes.
HOW?
A minor modification of the Random Forest paradigm: allow for randomness in the target attributes.
Standard ML-model
MERCS model
Compact representation:
PGM
kNN
NN
Nominal + Numeric data
Interpretable
Scalable
MERCS
PGM
kNN
NN
Nominal + Numeric data
Interpretable
Scalable
MERCS
Learning
Prediction
Competitive predictive performance when compared to PGMs
Multi-target trees can reduce the training time whilst maintaining performance.
Inference is orders of magnitude faster in MERCS when compared to PGMs
MERCS should handle any prediction task
MERCS MODEL
QUERIES
MERCS should handle any prediction task
2 main ideas
attribute
importance
chaining
PROBLEM
ASSUMPTION
IDEA
TRADEOFF
Which trees to use?
Trees with many missing inputs are likely to be mistaken.
Attribute Importance can quantify this effect.
Many trees vs. Good trees
IDEA
DEFINITION
CRITERION
Attribute importance is a way to quantify how appropriate a given tree is
Cf. Louppe et al., Understanding variable importances in forests of randomized trees, NeurIPS 2013
...or the original CART manual
Baseline (RF)
Most-relevant attribute importance
Query:
PROBLEM
ASSUMPTION
IDEA
TRADEOFF
Which trees to use?
Missing inputs are also predictable with MERCS itself
Chaining of component trees to answer a given query
Bottom-up vs. Top-Down
Cf. Read et al., Classifier chains for multi-label classification, ECMLPKDD 2009
Bottom-Up Chaining
Query:
Top-Down Chaining
Bottom-Up Chaining
Query:
Use most appropriate models, given
Use most appropriate models, given
Bottom-Up Chaining
Query:
Top-Down Chaining
Query:
Use most appropriate models, for target
Use most appropriate models, for target
Top-Down Chaining
Reasonable amount of missing values (<50%) Chaining really works
General approach for MD-ensembles
Obvious costs at prediction time
With too many attributes missing, every tree is flawed and no prediction algorithm can solve that.
An anomaly detector based on many predictive functions, a MERCS model.
Detection of an anomaly is only step one,
ideally we can understand, and ultimately prevent anomalies
A MERCS model splits the dataset in many 'contextual subpopulations'.
We then detect
A) anomalous subpopulations
B) anomalies within subpopulations.
WHY?
WHAT?
HOW?
'contextual subpopulations'
WHAT?
WHY?
HOW?
A subpopulation where all instances share a common context.
Anomalies are context-dependent.
Each node in a decision tree automatically represents such a subpopulation.
P
B
R
M
F
Imagine a database of people (P)
Which contains basketball players (B) and regular persons (R)
And those basketball-players are both male (M) and female (F)
P
B
R
M
F
This decision tree predicts the height of the persons in the database.
Example 01
Say, $$h=210 \,cm$$
Whether or not this is anomalous depends on context.
In R, probably yes.
In M, probably not.
P
B
R
M
F
This decision tree predicts the height of the persons in the database.
Example 02
Say, $$h=165 \,cm$$
Whether or not this is anomalous depends on context.
In M, probably yes.
In R, probably not.
P
B
R
M
F
This decision tree predicts the height of the persons in the database.
Anomaly Detection Mechanism 01
Model each contextual subpopulation by a density estimation to detect anomalies within such a subpopulation.
P
B
R
M
F
This decision tree predicts the height of the persons in the database.
Example 03
Assume that $$ |R| = 3,$$
Despite the fact that R is modelled well, it is the odd one out.
In a database of basketball players, these are 'anomalies'.
P
B
R
M
F
We define a distance metric between different subpopulations in order to detect anomalous subpopulations.
Anomaly Detection Mechanism 02
TEACHING
THESIS
BRI (4 times)
DB (4 times)
Supervised 7 theses so far, 2 in progress. (+ one best thesis award)
ADS
2 ECTS in transferable skills: OK
3 ECTS in DTAI seminar
Big Data Winterschool
Teaching Assistant Training: OK
Scientific Integrity: OK
PUBLICATIONS