Visualization for Explainable AI

November 26th

Dennis Collaris

            PhD Visualization

Machine Learning

75% risk!

Black box 

model

Domain expert

But why?

Data

Trending issue

DECISION SUPPORT

Husky vs. Wolf problem

DIAGNOSTICS

CycleGAN

DIAGNOSTICS

75% risk!

Black box 

model

Domain expert

Explanation

Aha!

But why?

Data

Explainer

Machine Learning

Overview

  • Fraud detection explanations (ziekteverzuim)

  • LEMON

  • sklearn-pmml-model

  • ExplainExplore (debiteurenmanagement)

  • New projects

    • Contribution-Value Plots

    • StrategyAtlas

Fraud detection explanations
sick-leave insurances

FRAUD DETECTION EXPLANATIONS

Real world scenario

Data

 

 

 

 

 

 

 

 

  • Missing/incorrect values

Model

  • 100 Random Forest
  • 500 trees each
  • ~25 decisions per tree
  • 1.312.471 decisions total!

×

OOB error: 27.7%

FRAUD DETECTION EXPLANATIONS

My solution

FRAUD DETECTION EXPLANATIONS

Feature contribution

[1] Palczewska, Anna et. al. Interpreting random forest classification models using a feature contribution method. In Integration of reusable systems, pp. 193–218. Springer, 2014.

0              1               2              3             x

y

2

1

 

7 : 7

6 : 2

...

\(Y_{mean}\) = 0.5

\(Y_{mean}\) = 0.75

\(LI_{X}\) = 0.25

Contribution per Decision Tree:

\(FC_{i,t}^f = \sum_{N \in R_{i,t}} LI_f^N\)

Contribution per Random Forest: 

\(FC_i^f = \frac{1}{T}\sum_{t=1}^T FC_{i,t}^f\)

X < 2.5

FRAUD DETECTION EXPLANATIONS

Partial dependence

[2] Friedman, Jerome H. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5): pp. 1189–1232, 2001. 

300

250

200

150

100

50

1

0%

100%

200

100

0

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company ABC Inc
Employees 5
Duration illness         days
... ...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

FRAUD DETECTION EXPLANATIONS

Local rule extraction

[3] Ribeiro, Marco Tulio et. al. Why should i trust you?: Explaining the predictions of any classifier. In

Proceedings of the 22nd ACM SIGKDD, pp. 1135–1144. ACM, 2016.

[4] Deng, Houtao. Interpreting tree ensembles with inTrees. arXiv preprint arXiv:1408.5456 , pp. 1–18, 2014.

0              1               2              3             x

y

2

1

 

FRAUD DETECTION EXPLANATIONS

Any project using a Random Forest in R!

 

  • Given a workshop for data science teams
  • Code for dashboard available at team Leon

FRAUD DETECTION EXPLANATIONS

Applications

Fraud team happy! 🎉

FRAUD DETECTION EXPLANATIONS

FRAUD DETECTION EXPLANATIONS

Paper presented at:
Workshop on Human Interpretability in Machine Learning

Stockholm, Sweden

GENERAL EXPLAINER TECHNIQUES

Applicable to any machine learning model

LIME

0              1               2             3             x

y

2

1

 

0              1               2             3             x

y

2

1

 

LEMON

LIME

LEMON

GENERAL EXPLAINER TECHNIQUES

GENERAL EXPLAINER TECHNIQUES

Applications

Can be used for any Python model...

sklearn-pmml-model

Can be used for any model...

DEBTOR MANAGEMENT

Effectiveness of debt collection strategies

DEBTOR MANAGEMENT

Problem

Surrogate learning

0              1               2             3             x

y

2

1

 

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

Feature 1

Feature 2

Feature 3

DEBTOR MANAGEMENT

Help data scientists to create and tune explanatory surrogate models.

DEBTOR MANAGEMENT

Configuration view

  • ← Any tabular data set

  • Any Python classifier, or PMML

  • ← Different surrogate models

DEBTOR MANAGEMENT

Feature view

  • ← Surrogate fidelity: R2
  • ← Prediction
  • ← Feature contribution

Local columns

Global columns

  • Shows values  or contribution →
  • Line color = predicted class →
  • Compare selected instance with data →
  • Clusters indicate ‘strategies’ →

DEBTOR MANAGEMENT

Local explanation view

DEBTOR MANAGEMENT

Context view

DEBTOR MANAGEMENT

Context view

Paper accepted at:
IEEE Pacific Visualization 2020

@

Tianjin, China

 

😢

DEBTOR MANAGEMENT

Applications

Anywhere where tabular data is used.

Any model in Python or PMML.

 

  • Debtor management (Team Randy Soet)
  • Team Data Science / Wheel of Knowledge
  • More soon!

NEXT STEPS

New projects

NEXT STEPS

Topic

Global

Instance-level

19 November (meeting stuurgroep Robotisering)

By iamdecode

19 November (meeting stuurgroep Robotisering)

  • 34