RATE Analytics
3rd June

Dennis Collaris
PhD Visualization

Machine Learning
75% risk!
Black box
model
Domain expert
But why?
Data
Trending issue



DECISION SUPPORT






Husky vs. Wolf problem
DIAGNOSTICS
CycleGAN

DIAGNOSTICS

75% risk!
Black box
model
Domain expert
Explanation
Aha!
But why?
Data
Explainer
Machine Learning
Overview
-
Fraud detection explanations (ziekteverzuim)
-
General explanation techniques
-
Strategy analysis (debiteurenmanagement)
-
Next steps
Fraud detection explanations
sick-leave insurances
FRAUD DETECTION EXPLANATIONS

Real world scenario
Data
- Missing/incorrect values
Model
- 100 Random Forest
- 500 trees each
- ~25 decisions per tree
- 1.312.471 decisions total!
×


OOB error: 27.7%
FRAUD DETECTION EXPLANATIONS





My solution
FRAUD DETECTION EXPLANATIONS
Feature contribution
[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.
0 1 2 3 x
y
2
1
7 : 7
6 : 2
...
\(Y_{mean}\) = 0.5
\(Y_{mean}\) = 0.75
\(LI_{X}\) = 0.25
Contribution per Decision Tree:
\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)
Contribution per Random Forest:
\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)
X < 2.5
FRAUD DETECTION EXPLANATIONS
Partial dependence
[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.
300
250
200
150
100
50
1
0%
100%
200
100
0
Duration illness
Fraud?
Fraud (55%)
Non-fraud (35%)
Company | ABC Inc |
Employees | 5 |
Duration illness | days |
... | ... |
Fraud (65%)
Fraud (90%)
Non-fraud (45%)
Non-fraud (40%)
Non-fraud (25%)
FRAUD DETECTION EXPLANATIONS
Local rule extraction
[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.
[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.
0 1 2 3 x
y
2
1

FRAUD DETECTION EXPLANATIONS
Anywhere where Random Forests are used!
- Fraud detection sick-leave
- Automatic acceptance ORV (Cees Willemsen)
- Heath insurance churn predictions (Lizan Onderwater-Kops)
- Debtor management (Team Randy Soet)
FRAUD DETECTION EXPLANATIONS
Applications
Fraud team happy! 🎉
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS

Paper presented at:
Workshop on Human Interpretability in Machine Learning
Stockholm, Sweden

GENERAL EXPLAINER TECHNIQUES
Applicable to any machine learning model


LIME
0 1 2 3 x
y
2
1
0 1 2 3 x
y
2
1
LEMON
LIME
LEMON
GENERAL EXPLAINER TECHNIQUES
GENERAL EXPLAINER TECHNIQUES
Applications
Can be used for any Python model...

sklearn-pmml-model
Can be used for any model...
DEBTOR MANAGEMENT
Effectiveness of debt collection strategies

DEBTOR MANAGEMENT
Problem
Surrogate learning
0 1 2 3 x
y
2
1


Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
DEBTOR MANAGEMENT
Help data scientists to create and tune explanatory surrogate models.





DEBTOR MANAGEMENT
Configuration view

-
← Any tabular data set
-
← Any Python classifier, or PMML
-
← Different surrogate models


DEBTOR MANAGEMENT
Global overview

-
Every line = an explanation
-
More information than traditional feature importance
-
Selecting subgroups


DEBTOR MANAGEMENT
Local explanation view
← quality
← explanation


DEBTOR MANAGEMENT
Context view




DEBTOR MANAGEMENT
Context view

Demo
DEBTOR MANAGEMENT
Paper submitted to:
IEEE Visual Analytics Systems and Technology
Vancouver, Canada
@

DEBTOR MANAGEMENT
Applications
Anywhere where tabular data is used.
Any model in Python or PMML.
- Debtor management (Team Randy Soet)
- Team Data Science / Wheel of Knowledge
- More soon!
NEXT STEPS
Opportunities
-
Achmea
-
Overlijdens Risico Verzekeringen (ORV) (Senna van Iersel)
-
Health insurance churn prediction (Lizan Kops)
-
Pricing GLMs (Joost van Bruggen)
-
Recruitment analytics (Silke Lhoëst)
-
Team Data Science (Schade Particulier) (Wouter Slot)
-
-
Academic
-
RATE colleagues
-
Stef van der Elzen (synerscope)
-
Lorentz Grant Workshop @ Leiden
-
IBM & Harvard @ Boston, USA
-
NEXT STEPS
Topic
Global
Instance-level
Deck July 1st
By iamdecode
Deck July 1st
- 39