Teamoverleg

May 7th

Dennis Collaris

            PhD Visualization

Predictive model interpretability

MY RESEARCH

Black box model

75% risk!

Black box 

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Topics

  • Random Forest explanations (ziekteverzuim)

  • LIME & LEMON

  • ExplainExplore (debiteurenmanagement)

Explanations for

fraud detection

GRADUATION

Real world scenario

Data

 

 

 

 

 

 

 

 

  • Lots of missing values

 

Model

  • 100 Random Forest
  • 500 trees each
  • ~25 decisions per tree
  • 1.312.471 models total!

×

OOB error: 27.7%

My solution

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0              1               2.             3             x

y

2

1

 

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest: 

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001. 

300

250

200

150

100

50

1

0%

100%

200

100

0

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company ABC Inc
Employees 5
Duration illness         days
... ...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0              1               2.             3             x

y

2

1

 

Fraud team happy! 🎉

Paper accepted for:
Workshop on Human Interpretability in Machine Learning

@

Beschikbaar bij

DEC team (Leon Vink)

LIME

0              1               2             3             x

y

2

1

 

-2          -1           0          1          2       d

LEMON

0              1               2             3             x

y

2

1

 

0              1               2             3             x

y

2

1

 

Beschikbaar:

  • LIME      github.com/marcotcr/lime
  • LEMON binnenkort!

EXPLAIN-EXPLORE

  • What is a good region of interest?
  • What is a good explanation?

Video

Beschikbaar:

Indien accepted: eind oktober!

Benefits

  • Data scientist
    Conceptual framework for evaluating explanation techniques.
     
  • Decision maker / Regulator / User
    Can explore explanations most relevant to his task.

What factors play a role?

  • Simple, accurate and generic model does not exist.
     
  • Any explanation is a simplified model.
     
  • There is a difference between desiderata about the model and explanation.
     
  • ML explanations always follow the "common effect" causal pattern (Keil, 2006).

Assumptions

What factors play a role?

Word cloud

What factors play a role?

Spreadsheet

What factors play a role?

High-level interaction

Next steps...

  • Explore sub categories of high-level desiderata
    (+ relations)
     
  • Quantifiable metrics to approximate these desiderata.
     
  • Paper?

Fraud detection model

75% Fraud

Fraud

Detection

Model

Fraud team

Company ABC Inc
Employees 5
Illness duration 14 days
Premium rate 5%
... ...

Insurance policy

Explanation

Aha!

But why?

Global vs Local

dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon

In general: Duration of illness is important

dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon
dude-icon

For this employer: Report date of sickness is important

Global vs Local

Solution

Copy of Deck May 7th

By iamdecode

Copy of Deck May 7th

  • 32