Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
Outline
- Background
- Problem Formulation
- The solution
- Case Study at Yahoo
Background



Machine learning is usually considered as a black box




I'd really like to know which treatment my patient should undergo to reduce the threat of a heart attack

According to our records, we expect to lower your risk if you will do more fitness







How do we provide such insights?
How do we provide such insights?


We focus on tree(s) models
- Widely used
- Easy to open the black box
Problem Formulation
- First we let X⊆Rn be the n-dimentional feature vector space.
- So feature vector is represented as x=(x1,x2,...,xn)T∈X.
- WLOG, the problem simplified as a binary classification problem, i.e. y={−1,+1}.
- Also, there's a function f:X↦y, i.e. classifier, and f^≈f as the classifier learned from data.
- Here we assume f to be an ensemble of K tree-based classifiers, i.e. f^=ϕ(h^1,...,h^k).
- WLOG, we assume ϕ to be the majority vote.






Goal
x′=argX∗min{1∣f^(x)=−1∧f^(x∗)=+1}






Goal
x′=argX∗min{1∣f^(x)=−1∧f^(x∗)=+1}
unbound?
- Given a distance measurement δ:X×X↦R
- We want to find a x′ so that
Goal
x′=argX∗min{δ(x,x∗)∣f^(x)=−1∧f^(x∗)=+1}
Goal
x′=argX∗min{δ(x,x∗)∣f^(x)=−1∧f^(x∗)=+1}

The solution
Observation
Each instance follows a root-to-leaf path for each tree

True negative instance







Key Idea


Perturbation
Key Idea


Perturbation
Perturbation: Diff, i.e. a potential suggestion



Step 1
Select one tree giving out the negative label
Step 2



Find all the paths in that tree that outputs positive label
∈

Step 3
Generate instances satisfying each paths


+
=

Formally
- pk,j+ as the j-th path in tree Tk.
- For all Tk∈T−, we calculate the perturbed feature vector xj(ϵ)+, by
- xj(ϵ)+[i]=θi−ϵ if the i-th condition is (xi≤θi) or,
- xj(ϵ)+[i]=θi+ϵ if the i-th condition is (xi>θi)


= 140/90+ϵ
Problem
Do we need a ϵ for every case?
Problem
Do we need a ϵ for every case?
Global ϵ!
No
Problem
Do we need a ϵ for every case?
Global ϵ!
No
If all the features is standardized to z-score, then a single ϵ is enough.
i.e. θi=σiti−ui
Solution
Problem
Perturb on one tree may invalid other trees
Problem
Perturb on one tree may invalid other trees
Solution
Have a round of check on the whole forest
Recap
- For all trees, get positive paths pk+.
- Get perturbed instance xj(ϵ)+ by buildPositiveInst(x,pk,j+,ϵ)
- Check if f^(xj(ϵ)+)=+1, and put the instance to a set Sif satisfied.
- For each instances in S, select the one with smallest δ(x,xj(ϵ)+) as x′
Use case at Yahoo
Ad quality

Ad quality varies, for
Motivation
Ad quality varies, for
Not serving them is not an option
Motivation
Ad quality varies, for
Not serving them is not an option
Serving them hurts the user experience
Motivation
Ad quality varies, for
Not serving them is not an option
Serving them hurts the user experience
As the
Motivation

Model settings
The impact of hyperparameters


Assessing the recommendation quality

1. Feature ranking
ϵ = 0.05 and δ = cosine distance
Assessing the recommendation quality
2. Top K recommendations based on cost, rank them based on feature importance
ϵ = 0.05 and δ = cosine distance

Assessing the recommendation quality
3. Use user study to test the performance, result shows:
- helpful: 57.3%,
- not helpful: 42.3% (25% neutral)
- not actionable: 0.4%
ϵ = 0.05 and δ = cosine distance
Assessing the recommendation quality
ϵ = 0.05 and δ = cosine distance

helpfulness(i)=helpful(i)+¬helptul(i)helpful(i)
Showcase of helpfulness for top features
Conclusion
- Not very hard technique
- The real world use case is interesting
Thanks
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
By Weiyüen Wu
Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking
- 659