Missing value imputation with MERCS:

a faster alternative to MissForest

This research received funding from the Flemish Government (AI Research Program)

Elia Van Wolputte & Hendrik Blockeel

KU Leuven and Leuven.AI, Belgium

DISCOVERY SCIENCE 2020

Overview

Problem Setting
Q1: Extending MERCS Framework
Q2: MERCS vs. MissForest
Summary

A typical ML-model solves this problem:

Given ,

what's ?

X

Y

Given ,

what's ?

X

Y

1. Problem setting

This is an idealized scenario:

if you do not know the exact task on beforehand, this paradigm breaks down.

Disadvantage

An ML-model which is

'robust to missing values' solves this problem:

Given ,

what's ?

X

Y

Without knowing !

Z

A typical ML-model solves this problem:

Given ,

what's ?

X

Y

Given ,

what's ?

X

Y

1. Problem setting

This is an idealized scenario:

if you do not know the exact task on beforehand, this paradigm breaks down.

Disadvantage

An ML-model which is

'robust to missing values' solves this problem:

Given ,

what's ?

X

Y

Without knowing !

Z

ML in industrial contexts

- sensor data

- sometimes, sensors break

- the ML-pipeline should not break because of a faulty sensor

Autocomplete in webforms:

- user fills in fields...

- ... in random order

- based on filled-in fields: suggestions need to be made

1. Problem setting

Examples

An ML-model which is

'robust to missing values' solves this problem:

Given ,

what's ?

X

Y

Without knowing !

Z

Possible approaches

1. Problem setting

Iterative Approaches

(e.g. MissForest)

Fix missing values. Retrain. Fix missing values again. Repeat until converged.

MERCS

Train once. The multi-directional model can be used to predict any column.

Naive Imputation

Guess the missing values, e.g. substitute the mean or median

Probabilistic Graphical Models

Use probabilistic inference to infer the most likely values for the missing entries, based on the known values.

Gaps in knowledge

Naive Imputation

MERCS

Fast but inaccurate.

Slow, but accurate.

Accurate, but you need to retrain for every new instance that comes in.

Possible alternative for iterative approaches

without retraining.

1. Problem setting

Probabilistic Graphical Models

Iterative Approaches

(e.g. MissForest)

Can a MERCS model be made robust to missing values at prediction time?

How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?

Q1

Q2

Can a MERCS model be made robust to missing values at prediction time?

How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?

= Extending MERCS

= MERCS vs. MissForest

Q1. Extending MERCS

Can a MERCS model be made robust to missing values at prediction time?

Q1. Extending MERCS

Uni-directional model

f_1:

X

\rightarrow

Y

Compact representation:

f_1

f_2

f_3

Cf. Van Wolputte et al.,MERCS: Multi-directional Ensembles of Regression and Classification Trees, AAAI-18

Q1. Extending MERCS

MERCS-model

f_1:

X

\rightarrow

Y

f_2:

X

\rightarrow

Y

f_3:

X

\rightarrow

Y

f_3:

Q1. Extending MERCS

Uni-directional model

e.g. decision tree

Multi-directional model

e.g. MERCS-model

Cf. Van Wolputte et al.,MERCS: Multi-directional Ensembles of Regression and Classification Trees, AAAI-18

f_1:

X

\rightarrow

Y

f_1:

X

\rightarrow

Y

f_2:

X

\rightarrow

Y

f_3:

X

\rightarrow

Y

f_1

f_2

f_3

Compact

representation

MERCS should handle any query

Q1. Extending MERCS

Prediction time

MISMATCH!

- Overcome mismatch

SOLUTION = BETTER PREDICTION STRATEGIES IN MERCS

- 2 MAIN IDEAS: ATTRIBUTE IMPORTANCE AND CHAINING

Training time

MERCS MODEL

f_1

f_2

f_3

- Learn a MERCS-model

- Queries unknown

q_1

q_2

q_3

- Use a MERCS-model

- Often no "perfect" tree for query!

QUERIES

Attribute importance

PROBLEM

ASSUMPTION

IDEA

TRADEOFF

Which trees to use?

Trees with many missing inputs are likely to be mistaken.

Attribute Importance can quantify this effect.

Many trees vs. Good trees

Q1. Extending MERCS

Attribute importance

IDEA

DEFINITION

CRITERION

Attribute importance is a way to quantify how appropriate a given tree is

I(A_j, T^i) \propto \sum \, w(node) \cdot \Delta i(node, A_j)

\{node|node \in T^i, attr(node) = A_j\}

C(T_{X^i \rightarrow Y^i}) \propto \sum I(A_j, T^i)

\{A_j | A_j \in X^i \cap X^q\}

Cf. Louppe et al., Understanding variable importances in forests of randomized trees, NeurIPS 2013

...or the original CART manual

Q1. Extending MERCS

How much does an attribute matter?

How much do the available attributes matter?

Baseline (RF)

Most-relevant attribute importance

Q1. Extending MERCS

q

Query:

\{A_1, A_3\} \rightarrow \{A_4\}

f

1.0

0.8

0.2

0.8

0.2

c

0 + 0 + 0 = 0

0.8 + 0 + 0=0.8

0 + 0 +0.2=0.2

0.8

0.2

0.8 + 0 + 0=0.8

q

f

c

Chaining

PROBLEM

ASSUMPTION

IDEA

TRADEOFF

Which trees to use?

Missing inputs are also predictable with MERCS itself

Chaining of component trees to answer a given query

Bottom-up vs. Top-Down

Cf. Read et al., Classifier chains for multi-label classification, ECMLPKDD 2009

Q1. Extending MERCS

Bottom-Up Chaining

Top-Down Chaining

Q1. Extending MERCS

Query:

\{A_1, A_2\} \rightarrow \{A_3\}

Q1. Extending MERCS

Use most relevant models, given \(\{A_1, A_2\}\)

Use most relevant models, given \(\{A_1, A_2, A_4\}\)

Query:

\{A_1, A_2\} \rightarrow \{A_3\}

Bottom-Up chaining

OK

Results

Q1. Extending MERCS

baseline

most-revelant

chaining

Both attribute importance (most-relevant) and chaining improve robustness to missing inputs
These improvements come at acceptable cost
(i.e. still orders of magnitude faster than PGM)

PGM

Q1. Extending MERCS

Conclusions

Q1

Can a MERCS model be made robust to missing values at prediction time?

A1

Yes, if you use chaining and attribute importance.

Q2. MERCS vs. MissForest

How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?

MissForest

Q2. MERCS vs. MissForest

TRAIN MODEL,

FIX MISSING VALUES

TRAIN MODEL

AGAIN,

FIX MISSING VALUES AGAIN

Experimental Setup

TRAINING DATA

TEST DATA (=QUERIES)

Q2. MERCS vs. MissForest

Queries come in one by one
E.g. ML in industry:
- You monitor your equipment all the time

TRAIN

TEST

MissForest Setup

Q2. MERCS vs. MissForest

MISSFOREST

TRAINING DATA +

SINGLE QUERY

TRAIN

TEST

MERCS

MERCS Setup

Q2. MERCS vs. MissForest

QUERIES

TRAINING DATA

MERCS

MODEL

Results

Q2. MERCS vs. MissForest

Very comparable performance (many draws)
No retraining in MERCS =>
~ 2 orders of magnitude speedup

Conclusions

Q2. MERCS vs. MissForest

Q2

How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?

A2

MERCS performs at-par for a fraction of the time cost.

Q2

How does MERCS compare against MissForest, a well-established tree-based technique to deal with missing data?

A2

MERCS performs at-par for a fraction of the time cost.

Q1

Can a MERCS model be made robust to missing values at prediction time?

A1

Yes, if you use chaining and attribute importance.

4. Conclusions

1 Here, we work with attribute importance, which is calculated on the full training dataset.

Can we take a fully instance-based approach? (SHAP-values are per-instance attribute importances)

4. Future work

2 Application-wise, we are looking into anomaly detection with MERCS now.

Thank you for your attention!

This research received funding from the Flemish Government (AI Research Program)

DISCOVERY SCIENCE 2020