Teamoverleg

May 7th

Dennis Collaris

PhD Visualization

Predictive model interpretability

MY RESEARCH

Black box model

75% risk!

Black box

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Topics

Random Forest explanations (ziekteverzuim)
LIME & LEMON
ExplainExplore (debiteurenmanagement)

Explanations for

fraud detection

GRADUATION

Real world scenario

Data

Lots of missing values

Model

100 Random Forest
500 trees each
~25 decisions per tree
1.312.471 models total!

OOB error: 27.7%

My solution

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0 1 2. 3 x

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest:

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.

300

250

200

150

100

100%

200

100

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company	ABC Inc
Employees	5
Duration illness	days
...	...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0 1 2. 3 x

Fraud team happy! 🎉

Paper accepted for:
Workshop on Human Interpretability in Machine Learning

Beschikbaar bij

DEC team (Leon Vink)

LIME

0 1 2 3 x

-2 -1 0 1 2 d

LEMON

0 1 2 3 x

Beschikbaar:

LIME github.com/marcotcr/lime
LEMON binnenkort!

EXPLAIN-EXPLORE

What is a good region of interest?
What is a good explanation?

Video

Beschikbaar:

Indien accepted: eind oktober!

Benefits

Data scientist
Conceptual framework for evaluating explanation techniques.
Decision maker / Regulator / User
Can explore explanations most relevant to his task.

What factors play a role?

Simple, accurate and generic model does not exist.
Any explanation is a simplified model.
There is a difference between desiderata about the model and explanation.
ML explanations always follow the "common effect" causal pattern (Keil, 2006).

Assumptions

What factors play a role?

Word cloud

What factors play a role?

Spreadsheet

What factors play a role?

High-level interaction

Next steps...

Explore sub categories of high-level desiderata
(+ relations)
Quantifiable metrics to approximate these desiderata.
Paper?

Fraud detection model

75% Fraud

Fraud

Detection

Model

Fraud team

Company	ABC Inc
Employees	5
Illness duration	14 days
Premium rate	5%
...	...

Insurance policy

Explanation

Aha!

But why?

Global vs Local

In general: Duration of illness is important

For this employer: Report date of sickness is important

Global vs Local

Solution

Copy of Deck May 7th

By iamdecode

Teamoverleg

Predictive model interpretability

Black box model

Topics

Explanations for

fraud detection

Real world scenario

Data

Model

My solution

Feature contribution

Partial dependence

Local rule extraction

Fraud team happy! 🎉

Beschikbaar bij

DEC team (Leon Vink)

Beschikbaar:

Video

Beschikbaar:

Benefits

What factors play a role?

Assumptions

What factors play a role?

Word cloud

What factors play a role?

Spreadsheet

What factors play a role?

High-level interaction

Next steps...

Fraud detection model

Global vs Local

Global vs Local

Solution

Copy of Deck May 7th

More from iamdecode