Machine Learning

75% risk!

Black box

model

Domain expert

But why?

Data

Trending issue

DECISION SUPPORT

Husky vs. Wolf problem

https://arxiv.org/abs/1602.04938

DIAGNOSTICS

CycleGAN

https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-from-its-creators-to-cheat-at-its-appointed-task

DIAGNOSTICS

75% risk!

Black box

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Machine Learning

Fraud detection explanations
sick-leave insurances

FRAUD DETECTION EXPLANATIONS

Real world scenario

Data

Missing/incorrect values

Model

100 Random Forest
500 trees each
~25 decisions per tree
1.312.471 decisions total!

×

OOB error: 27.7%

FRAUD DETECTION EXPLANATIONS

My solution

FRAUD DETECTION EXPLANATIONS

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0 1 2 3 x

y

2

1

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest:

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

FRAUD DETECTION EXPLANATIONS

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.

300

250

200

150

100

50

1

0%

100%

200

100

0

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company	ABC Inc
Employees	5
Duration illness	days
...	...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

FRAUD DETECTION EXPLANATIONS

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0 1 2 3 x

y

2

1

FRAUD DETECTION EXPLANATIONS

Anywhere where Random Forests are used!

Fraud detection sick-leave
Automatic acceptance ORV (Cees Willemsen)
Heath insurance churn predictions (Lizan Onderwater-Kops)
Debtor management (Team Randy Soet)

FRAUD DETECTION EXPLANATIONS

Applications

Fraud team happy! 🎉

FRAUD DETECTION EXPLANATIONS

Paper presented at:
Workshop on Human Interpretability in Machine Learning

Stockholm, Sweden

GENERAL EXPLAINER TECHNIQUES

Applicable to any machine learning model

LIME

0 1 2 3 x

y

2

1

0 1 2 3 x

y

2

1

LEMON

LIME

LEMON

GENERAL EXPLAINER TECHNIQUES

Applications

Can be used for any Python model...

sklearn-pmml-model

Can be used for any model...

DEBTOR MANAGEMENT

Effectiveness of debt collection strategies

DEBTOR MANAGEMENT

Problem

Surrogate learning

0 1 2 3 x

y

2

1

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

DEBTOR MANAGEMENT

Help data scientists to create and tune explanatory surrogate models.

DEBTOR MANAGEMENT

Configuration view

← Any tabular data set
← Any Python classifier, or PMML
← Different surrogate models

DEBTOR MANAGEMENT

Global overview

R²

Every line = an explanation
More information than traditional feature importance
Selecting subgroups

R²

DEBTOR MANAGEMENT

Local explanation view

← quality

← explanation

DEBTOR MANAGEMENT

Context view

DEBTOR MANAGEMENT

Context view

Demo

DEBTOR MANAGEMENT

Paper submitted to:
IEEE Visual Analytics Systems and Technology

Vancouver, Canada

@

DEBTOR MANAGEMENT

Applications

Anywhere where tabular data is used.

Any model in Python or PMML.

Debtor management (Team Randy Soet)
Team Data Science / Wheel of Knowledge
More soon!

NEXT STEPS

Opportunities

Achmea
- Overlijdens Risico Verzekeringen (ORV) (Senna van Iersel)
- Health insurance churn prediction (Lizan Kops)
- Pricing GLMs (Joost van Bruggen)
- Recruitment analytics (Silke Lhoëst)
- Team Data Science (Schade Particulier) (Wouter Slot)
Academic
- RATE colleagues
- Stef van der Elzen (synerscope)
- Lorentz Grant Workshop @ Leiden
- IBM & Harvard @ Boston, USA

NEXT STEPS

Topic

Global

Instance-level

RATE Analytics

Machine Learning

Trending issue

Husky vs. Wolf problem

CycleGAN

Machine Learning

Overview

Fraud detection explanations
sick-leave insurances

Real world scenario

Data

Model

My solution

Feature contribution

Partial dependence

Local rule extraction

Applications

Fraud team happy! 🎉

Applications

Effectiveness of debt collection strategies

Problem

Configuration view

Global overview

Local explanation view

Context view

Context view

Demo

Applications

Opportunities

Topic

Deck July 1st

Deck July 1st

iamdecode

RATE Analytics

Machine Learning

Trending issue

Husky vs. Wolf problem

CycleGAN

Machine Learning

Overview

Fraud detection explanations sick-leave insurances

Real world scenario

Data

Model

My solution

Feature contribution

Partial dependence

Local rule extraction

Applications

Fraud team happy! 🎉

Applications

Effectiveness of debt collection strategies

Problem

Configuration view

Global overview

Local explanation view

Context view

Context view

Demo

Applications

Opportunities

Topic

Deck July 1st

More from iamdecode

Fraud detection explanations
sick-leave insurances