RATE Analytics

3rd June

Dennis Collaris

            PhD Visualization

Machine Learning

75% risk!

Black box 

model

Domain expert

But why?

Data

Trending issue

DECISION SUPPORT

Husky vs. Wolf problem

DIAGNOSTICS

CycleGAN

DIAGNOSTICS

75% risk!

Black box 

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Machine Learning

Overview

  • Fraud detection explanations (ziekteverzuim)

  • General explanation techniques

  • Strategy analysis (debiteurenmanagement)

  • Next steps

Fraud detection explanations
sick-leave insurances

FRAUD DETECTION EXPLANATIONS

Real world scenario

Data

 

 

 

 

 

 

 

 

  • Missing/incorrect values

Model

  • 100 Random Forest
  • 500 trees each
  • ~25 decisions per tree
  • 1.312.471 decisions total!

×

OOB error: 27.7%

FRAUD DETECTION EXPLANATIONS

My solution

FRAUD DETECTION EXPLANATIONS

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0              1               2              3             x

y

2

1

 

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest: 

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

FRAUD DETECTION EXPLANATIONS

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001. 

300

250

200

150

100

50

1

0%

100%

200

100

0

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company ABC Inc
Employees 5
Duration illness         days
... ...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

FRAUD DETECTION EXPLANATIONS

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0              1               2              3             x

y

2

1

 

FRAUD DETECTION EXPLANATIONS

Anywhere where Random Forests are used!

 

  • Fraud detection sick-leave
  • Automatic acceptance ORV (Cees Willemsen)
  • Heath insurance churn predictions (Lizan Onderwater-Kops)
  • Debtor management (Team Randy Soet)

FRAUD DETECTION EXPLANATIONS

Applications

Fraud team happy! 🎉

FRAUD DETECTION EXPLANATIONS

FRAUD DETECTION EXPLANATIONS

Paper presented at:
Workshop on Human Interpretability in Machine Learning

Stockholm, Sweden

GENERAL EXPLAINER TECHNIQUES

Applicable to any machine learning model

LIME

0              1               2             3             x

y

2

1

 

0              1               2             3             x

y

2

1

 

LEMON

LIME

LEMON

GENERAL EXPLAINER TECHNIQUES

GENERAL EXPLAINER TECHNIQUES

Applications

Can be used for any Python model...

sklearn-pmml-model

Can be used for any model...

DEBTOR MANAGEMENT

Effectiveness of debt collection strategies

DEBTOR MANAGEMENT

Problem

Surrogate learning

0              1               2             3             x

y

2

1

 

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

DEBTOR MANAGEMENT

Help data scientists to create and tune explanatory surrogate models.

DEBTOR MANAGEMENT

Configuration view

  • ← Any tabular data set

  • Any Python classifier, or PMML

  • ← Different surrogate models

DEBTOR MANAGEMENT

Global overview

  • Every line  = an explanation

  • More information than traditional feature importance

  • Selecting subgroups

DEBTOR MANAGEMENT

Local explanation view

← quality 

← explanation

DEBTOR MANAGEMENT

Context view

DEBTOR MANAGEMENT

Context view

Demo

DEBTOR MANAGEMENT

Paper submitted to:
IEEE Visual Analytics Systems and Technology

Vancouver, Canada

@

DEBTOR MANAGEMENT

Applications

Anywhere where tabular data is used.

Any model in Python or PMML.

 

  • Debtor management (Team Randy Soet)
  • Team Data Science / Wheel of Knowledge
  • More soon!

NEXT STEPS

Opportunities

  • Achmea

    • Overlijdens Risico Verzekeringen (ORV) (Senna van Iersel)

    • Health insurance churn prediction (Lizan Kops)

    • Pricing GLMs (Joost van Bruggen)

    • Recruitment analytics (Silke Lhoëst)

    • Team Data Science (Schade Particulier) (Wouter Slot)

  • Academic

    • RATE colleagues

    • Stef van der Elzen (synerscope)

    • Lorentz Grant Workshop @ Leiden

    • IBM & Harvard @ Boston, USA

NEXT STEPS

Topic

Global

Instance-level

Deck July 1st

By iamdecode

Deck July 1st

  • 39