Visualization for Explainable AI

November 26th

Dennis Collaris

PhD Visualization

Machine Learning

75% risk!

Black box

model

Domain expert

But why?

Data

Trending issue

DECISION SUPPORT

Husky vs. Wolf problem

https://arxiv.org/abs/1602.04938

DIAGNOSTICS

CycleGAN

https://techcrunch.com/2018/12/31/this-clever-ai-hid-data-from-its-creators-to-cheat-at-its-appointed-task

DIAGNOSTICS

75% risk!

Black box

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Machine Learning

Overview

Fraud detection explanations (ziekteverzuim)
LEMON
sklearn-pmml-model
ExplainExplore (debiteurenmanagement)
New projects
- Contribution-Value Plots
- StrategyAtlas

Fraud detection explanations
sick-leave insurances

FRAUD DETECTION EXPLANATIONS

Real world scenario

Data

Missing/incorrect values

Model

100 Random Forest
500 trees each
~25 decisions per tree
1.312.471 decisions total!

OOB error: 27.7%

FRAUD DETECTION EXPLANATIONS

My solution

FRAUD DETECTION EXPLANATIONS

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0 1 2 3 x

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest:

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

FRAUD DETECTION EXPLANATIONS

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001.

300

250

200

150

100

100%

200

100

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company	ABC Inc
Employees	5
Duration illness	days
...	...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

FRAUD DETECTION EXPLANATIONS

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0 1 2 3 x

FRAUD DETECTION EXPLANATIONS

Any project using a Random Forest in R!

Given a workshop for data science teams
Code for dashboard available at team Leon

FRAUD DETECTION EXPLANATIONS

Applications

Fraud team happy! 🎉

FRAUD DETECTION EXPLANATIONS

Paper presented at:
Workshop on Human Interpretability in Machine Learning

Stockholm, Sweden

GENERAL EXPLAINER TECHNIQUES

Applicable to any machine learning model

LIME

0 1 2 3 x

LEMON

LIME

LEMON

GENERAL EXPLAINER TECHNIQUES

Applications

Can be used for any Python model...

sklearn-pmml-model

Can be used for any model...

DEBTOR MANAGEMENT

Effectiveness of debt collection strategies

DEBTOR MANAGEMENT

Problem

Surrogate learning

0 1 2 3 x

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

DEBTOR MANAGEMENT

Help data scientists to create and tune explanatory surrogate models.

DEBTOR MANAGEMENT

Configuration view

← Any tabular data set
← Any Python classifier, or PMML
← Different surrogate models

DEBTOR MANAGEMENT

Feature view

← Surrogate fidelity: R2
← Prediction
← Feature contribution

Local columns

Global columns

Shows values or contribution →
Line color = predicted class →
Compare selected instance with data →
Clusters indicate ‘strategies’ →

DEBTOR MANAGEMENT

Local explanation view

DEBTOR MANAGEMENT

Context view

DEBTOR MANAGEMENT

Context view

Paper accepted at:
IEEE Pacific Visualization 2020

Tianjin, China

😢

DEBTOR MANAGEMENT

Applications

Anywhere where tabular data is used.

Any model in Python or PMML.

Debtor management (Team Randy Soet)
Team Data Science / Wheel of Knowledge
More soon!

NEXT STEPS

New projects

NEXT STEPS

Topic

Global

Instance-level

19 November (meeting stuurgroep Robotisering)

By iamdecode

19 November (meeting stuurgroep Robotisering)

Visualization for Explainable AI

Machine Learning

Trending issue

Husky vs. Wolf problem

CycleGAN

Machine Learning

Overview

Fraud detection explanations sick-leave insurances

Real world scenario

Data

Model

My solution

Feature contribution

Partial dependence

Local rule extraction

Applications

Fraud team happy! 🎉

Applications

Effectiveness of debt collection strategies

Problem

Configuration view

Feature view

Local explanation view

Context view

Context view

Applications

New projects

Topic

19 November (meeting stuurgroep Robotisering)

More from iamdecode

Fraud detection explanations
sick-leave insurances