3rd June
Dennis Collaris
PhD Visualization
75% risk!
Black box
model
Domain expert
But why?
Data
DECISION SUPPORT
DIAGNOSTICS
DIAGNOSTICS
75% risk!
Black box
model
Domain expert
Explanation
Aha!
But why?
Data
Explainer
Fraud detection explanations (ziekteverzuim)
General explanation techniques
Strategy analysis (debiteurenmanagement)
Next steps
FRAUD DETECTION EXPLANATIONS
×
OOB error: 27.7%
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.
0 1 2 3 x
y
2
1
7 : 7
6 : 2
...
\(Y_{mean}\) = 0.5
\(Y_{mean}\) = 0.75
\(LI_{X}\) = 0.25
Contribution per Decision Tree:
\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)
Contribution per Random Forest:
\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)
X < 2.5
FRAUD DETECTION EXPLANATIONS
[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.
300
250
200
150
100
50
1
0%
100%
200
100
0
Duration illness
Fraud?
Fraud (55%)
Non-fraud (35%)
Company | ABC Inc |
Employees | 5 |
Duration illness | days |
... | ... |
Fraud (65%)
Fraud (90%)
Non-fraud (45%)
Non-fraud (40%)
Non-fraud (25%)
FRAUD DETECTION EXPLANATIONS
[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In
Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.
[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.
0 1 2 3 x
y
2
1
FRAUD DETECTION EXPLANATIONS
Anywhere where Random Forests are used!
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
FRAUD DETECTION EXPLANATIONS
Paper presented at:
Workshop on Human Interpretability in Machine Learning
Stockholm, Sweden
GENERAL EXPLAINER TECHNIQUES
Applicable to any machine learning model
LIME
0 1 2 3 x
y
2
1
0 1 2 3 x
y
2
1
LEMON
LIME
LEMON
GENERAL EXPLAINER TECHNIQUES
GENERAL EXPLAINER TECHNIQUES
Can be used for any Python model...
sklearn-pmml-model
Can be used for any model...
DEBTOR MANAGEMENT
DEBTOR MANAGEMENT
Surrogate learning
0 1 2 3 x
y
2
1
Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
Feature 1
Feature 2
Feature 3
DEBTOR MANAGEMENT
Help data scientists to create and tune explanatory surrogate models.
DEBTOR MANAGEMENT
← Any tabular data set
← Any Python classifier, or PMML
← Different surrogate models
DEBTOR MANAGEMENT
Every line = an explanation
More information than traditional feature importance
Selecting subgroups
DEBTOR MANAGEMENT
← quality
← explanation
DEBTOR MANAGEMENT
DEBTOR MANAGEMENT
DEBTOR MANAGEMENT
Paper submitted to:
IEEE Visual Analytics Systems and Technology
Vancouver, Canada
@
DEBTOR MANAGEMENT
Anywhere where tabular data is used.
Any model in Python or PMML.
NEXT STEPS
Achmea
Overlijdens Risico Verzekeringen (ORV) (Senna van Iersel)
Health insurance churn prediction (Lizan Kops)
Pricing GLMs (Joost van Bruggen)
Recruitment analytics (Silke Lhoëst)
Team Data Science (Schade Particulier) (Wouter Slot)
Academic
RATE colleagues
Stef van der Elzen (synerscope)
Lorentz Grant Workshop @ Leiden
IBM & Harvard @ Boston, USA
NEXT STEPS
Global
Instance-level