3rd June
Dennis Collaris
PhD Visualization
75% risk!
Black box
model
Domain expert
But why?
Data
DECISION SUPPORT
DIAGNOSTICS
DIAGNOSTICS
75% risk!
Black box
model
Domain expert
Explanation
Aha!
But why?
Data
Explainer
Fraud detection explanations (ziekteverzuim)
General explanation techniques
Strategy analysis (debiteurenmanagement)
Next steps
FRAUD DETECTION EXPLANATIONS
×
OOB error: 27.7%
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.
0 1 2 3 x
y
2
1
7 : 7
6 : 2
...
\(Y_{mean}\) = 0.5
\(Y_{mean}\) = 0.75
\(LI_{X}\) = 0.25
Contribution per Decision Tree:
\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)
Contribution per Random Forest:
\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)
X < 2.5
FRAUD DETECTION EXPLANATIONS
[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.
300
250
200
150
100
50
1
0%
100%
200
100
0
Duration illness
Fraud?
Fraud (55%)
Non-fraud (35%)
Company | ABC Inc |
Employees | 5 |
Duration illness | days |
... | ... |
Fraud (65%)
Fraud (90%)
Non-fraud (45%)
Non-fraud (40%)
Non-fraud (25%)
FRAUD DETECTION EXPLANATIONS
[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.
[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.
0 1 2 3 x
y
2
1
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
Anywhere where Random Forests are used!
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
Paper presented at:
Workshop on Human Interpretability in Machine Learning
Stockholm, Sweden
GENERAL EXPLAINER TECHNIQUES
Applicable to any machine learning model
LIME
0 1 2 3 x
y
2
1
0 1 2 3 x
y
2
1
LEMON
LIME
LEMON
GENERAL EXPLAINER TECHNIQUES
GENERAL EXPLAINER TECHNIQUES
Anywhere where Python can be used...
sklearn-pmml-model
DEBTOR MANAGEMENT
DEBTOR MANAGEMENT
Surrogate learning
0 1 2 3 x
y
2
1
Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
DEBTOR MANAGEMENT
Help data scientists to create and tune explanatory surrogate models.
DEBTOR MANAGEMENT
Paper submitted to:
IEEE Visual Analytics Systems and Technology
Vancouver, Canada
@
DEBTOR MANAGEMENT
Anywhere where tabular data is used.
Any model in Python or PMML.
NEXT STEPS
Achmea
Overlijdens Risico Verzekeringen (ORV) (Senna van Iersel)
Health insurance churn prediction (Lizan Kops)
Pricing GLMs (Joost van Bruggen)
Recruitment analytics (Silke Lhoëst)
Team Data Science (Schade Particulier) (Wouter Slot)
Collaboration
RATE colleagues
Stef van der Elzen (synerscope)
Lorentz Grant Workshop @ Leiden
IBM & Hardvard @ Boston
NEXT STEPS
Global
Local