Dennis Collaris

Computer Science & Engineering

Explainability of
Machine Learning models

Graduation project

Explainability of
Machine Learning models

Machine Learning

 

Fraud detection
for sick leave insurances

Fraud detection model

75% Fraud

Fraud

Detection

Model

Fraud team

Company ABC Inc
Employees 5
Illness duration 14 days
Premium rate 5%
... ...

Insurance policy

But why?

Explanations

Explanation

 

Aha!

Duration illness < 14 days

Non-fraud

Yes

No

Premium percentage < 5%

Yes

No

Non-fraud

Fraud

Models

  • Decision Tree
     
  • Random Forest
     
  • 100 Random Forests
    (ensemble)

Decisions

2

23

69

12,704

1,312,471

Difficult problem

Global vs Local

In general: Duration of illness is important

For this employer: Report date of sickness is important

Global vs  Local

My solution

Dashboards

Feature importance

Technique 1

Technique 2

Technique 3

1. Feature importance

Company ABC Inc
Employees 5
Illness duration 14 days
Premium rate 5%
... ...

Insurance policy

Technique 2

Technique 3

1. Feature importance

Technique 2

Technique 3

1. Feature importance

"Disagreement"

Technique 2

Technique 3

1. Feature importance

Demo

Technique 2

Technique 3

1. Feature importance

Sensitivity analysis

Technique 2

2. Sensitivity analysis

Technique 3

1. Feature importance

Company ABC Inc
Employees 5
Illness duration 14 days
Premium rate 5%
... ...

Insurance policy

300

250

200

150

100

50

1

0%

100%

300​

200

100

0

Duration illness

Fraud?

Fraud (55%)

Non-fraud (35%)

Company ABC Inc
Employees 5
Illness duration         days
Premium rate 5%
... ...

Fraud (65%)

Fraud (90%)

Non-fraud (45%)

Non-fraud (40%)

Non-fraud (25%)

2. Sensitivity analysis

Technique 3

1. Feature importance

Demo

2. Sensitivity analysis

Technique 3

1. Feature importance

Model simplification

Technique 3

2. Sensitivity analysis

3. Model simplification

1. Feature importance

Complex

Model

Company ABC Inc
Employees 5
Illness duration 14 days
Premium rate 5%
... ...

Insurance policy

Simple

Model

2. Sensitivity analysis

3. Model simplification

1. Feature importance

Policy 1  Fraud (88%)

Policy 2  Non-fraud (25%)

2. Sensitivity analysis

3. Model simplification

1. Feature importance

Demo

2. Sensitivity analysis

3. Model simplification

1. Feature importance

Evaluation

Conclusion

Questions?

Thesis presentation

By iamdecode

Thesis presentation

  • 32