Missing value imputation with MERCS:
a faster alternative to MissForest
This research received funding from the Flemish Government (AI Research Program)
Elia Van Wolputte & Hendrik Blockeel
KU Leuven and Leuven.AI, Belgium
DISCOVERY SCIENCE 2020




Overview
- Problem Setting
- Q1: Extending MERCS Framework
- Q2: MERCS vs. MissForest
- Summary
A typical ML-model solves this problem:
Given ,
what's ?
Given ,
what's ?
1. Problem setting
This is an idealized scenario:
if you do not know the exact task on beforehand, this paradigm breaks down.
Disadvantage
An ML-model which is
'robust to missing values' solves this problem:
Given ,
what's ?
Without knowing !
A typical ML-model solves this problem:
Given ,
what's ?
Given ,
what's ?
1. Problem setting
This is an idealized scenario:
if you do not know the exact task on beforehand, this paradigm breaks down.
Disadvantage
An ML-model which is
'robust to missing values' solves this problem:
Given ,
what's ?
Without knowing !
ML in industrial contexts
- sensor data
- sometimes, sensors break
- the ML-pipeline should not break because of a faulty sensor
Autocomplete in webforms:
- user fills in fields...
- ... in random order
- based on filled-in fields: suggestions need to be made
1. Problem setting
Examples
An ML-model which is
'robust to missing values' solves this problem:
Given ,
what's ?
Without knowing !
Possible approaches
1. Problem setting
Iterative Approaches
(e.g. MissForest)
Fix missing values. Retrain. Fix missing values again. Repeat until converged.
MERCS
Train once. The multi-directional model can be used to predict any column.
Naive Imputation
Guess the missing values, e.g. substitute the mean or median
Probabilistic Graphical Models
Use probabilistic inference to infer the most likely values for the missing entries, based on the known values.
Gaps in knowledge
Naive Imputation
MERCS
Fast but inaccurate.
Slow, but accurate.
Accurate, but you need to retrain for every new instance that comes in.
Possible alternative for iterative approaches
without retraining.
1. Problem setting
Probabilistic Graphical Models
Iterative Approaches
(e.g. MissForest)
Can a MERCS model be made robust to missing values at prediction time?
How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?
Q1
Q2
Can a MERCS model be made robust to missing values at prediction time?
How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?
= Extending MERCS
= MERCS vs. MissForest
Q1. Extending MERCS
Can a MERCS model be made robust to missing values at prediction time?
Q1. Extending MERCS
Uni-directional model
Compact representation:
Cf. Van Wolputte et al.,MERCS: Multi-directional Ensembles of Regression and Classification Trees, AAAI-18
Q1. Extending MERCS
MERCS-model
Q1. Extending MERCS
Uni-directional model
e.g. decision tree
Multi-directional model
e.g. MERCS-model
Cf. Van Wolputte et al.,MERCS: Multi-directional Ensembles of Regression and Classification Trees, AAAI-18
Compact
representation
MERCS should handle any query
Q1. Extending MERCS
Prediction time
MISMATCH!
- Overcome mismatch
SOLUTION = BETTER PREDICTION STRATEGIES IN MERCS
- 2 MAIN IDEAS: ATTRIBUTE IMPORTANCE AND CHAINING
Training time
MERCS MODEL
- Learn a MERCS-model
- Queries unknown
- Use a MERCS-model
- Often no "perfect" tree for query!
QUERIES
Attribute importance
PROBLEM
ASSUMPTION
IDEA
TRADEOFF
Which trees to use?
Trees with many missing inputs are likely to be mistaken.
Attribute Importance can quantify this effect.
Many trees vs. Good trees
Q1. Extending MERCS
Attribute importance
IDEA
DEFINITION
CRITERION
Attribute importance is a way to quantify how appropriate a given tree is
Cf. Louppe et al., Understanding variable importances in forests of randomized trees, NeurIPS 2013
...or the original CART manual
Q1. Extending MERCS
How much does an attribute matter?
How much do the available attributes matter?

Baseline (RF)
Most-relevant attribute importance
Q1. Extending MERCS
q
Query:
f
1.0
0.8
0.2
0.8
0.2
c
0 + 0 + 0 = 0
0.8 + 0 + 0=0.8
0 + 0 +0.2=0.2

0.8
0.2
0.8 + 0 + 0=0.8
q
f
c
Chaining
PROBLEM
ASSUMPTION
IDEA
TRADEOFF
Which trees to use?
Missing inputs are also predictable with MERCS itself
Chaining of component trees to answer a given query
Bottom-up vs. Top-Down
Cf. Read et al., Classifier chains for multi-label classification, ECMLPKDD 2009
Q1. Extending MERCS


Bottom-Up Chaining
Top-Down Chaining
Q1. Extending MERCS
Query:
Q1. Extending MERCS
Use most relevant models, given \(\{A_1, A_2\}\)
Use most relevant models, given \(\{A_1, A_2, A_4\}\)
Query:

Bottom-Up chaining
OK
Results
Q1. Extending MERCS
baseline
most-revelant
chaining
- Both attribute importance (most-relevant) and chaining improve robustness to missing inputs
- These improvements come at acceptable cost
(i.e. still orders of magnitude faster than PGM)
PGM
Q1. Extending MERCS
Conclusions
Q1
Can a MERCS model be made robust to missing values at prediction time?
A1
Yes, if you use chaining and attribute importance.
Q2. MERCS vs. MissForest
How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?
MissForest
Q2. MERCS vs. MissForest
TRAIN MODEL,
FIX MISSING VALUES
TRAIN MODEL
AGAIN,
FIX MISSING VALUES AGAIN
Experimental Setup
TRAINING DATA
TEST DATA (=QUERIES)
Q2. MERCS vs. MissForest
Q2. MERCS vs. MissForest
- Queries come in one by one
- E.g. ML in industry:
- You monitor your equipment all the time
TRAIN
TEST
MissForest Setup
Q2. MERCS vs. MissForest
Q2. MERCS vs. MissForest
MISSFOREST
MISSFOREST
TRAINING DATA +
SINGLE QUERY
TRAIN
TEST
MERCS
MERCS Setup
Q2. MERCS vs. MissForest
Q2. MERCS vs. MissForest
QUERIES
TRAINING DATA
MERCS
MODEL
Results
Q2. MERCS vs. MissForest
- Very comparable performance (many draws)
- No retraining in MERCS =>
~ 2 orders of magnitude speedup
Conclusions
Q2. MERCS vs. MissForest
Q2
How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?
A2
MERCS performs at-par for a fraction of the time cost.
Q2
How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?
A2
MERCS performs at-par for a fraction of the time cost.
Q1
Can a MERCS model be made robust to missing values at prediction time?
A1
Yes, if you use chaining and attribute importance.
4. Conclusions
1
Here, we work with attribute importance, which is calculated on the full training dataset.
Can we take a fully instance-based approach? (SHAP-values are per-instance attribute importances)
4. Future work
2
Application-wise, we are looking into anomaly detection with MERCS now.
Thank you for your attention!
This research received funding from the Flemish Government (AI Research Program)
DISCOVERY SCIENCE 2020




ds2020
By eliavw
ds2020
Presentation for Discovery Science 2020
- 152