Machine learning is usually considered as a black box
I'd really like to know which treatment my patient should undergo to reduce the threat of a heart attack
According to our records, we expect to lower your risk if you will do more fitness
\[x' = \arg \min_{X^*} \{1|\hat{f}(x) = -1 \wedge \hat{f}(x^*) = +1 \}\]
\[x' = \arg \min_{X^*} \{1|\hat{f}(x) = -1 \wedge \hat{f}(x^*) = +1 \}\]
unbound?
\[x' = \arg \min_{X^*} \{\delta(x,x^*)|\hat{f}(x) = -1 \wedge \hat{f}(x^*) = +1 \}\]
\[x' = \arg \min_{X^*} \{\delta(x,x^*)|\hat{f}(x) = -1 \wedge \hat{f}(x^*) = +1 \}\]
Each instance follows a root-to-leaf path for each tree
True negative instance
Perturbation
Perturbation
Perturbation: Diff, i.e. a potential suggestion
Select one tree giving out the negative label
Find all the paths in that tree that outputs positive label
\(\in\)
Generate instances satisfying each paths
= \(140/90 + \epsilon\)
Do we need a \(\epsilon\) for every case?
Do we need a \(\epsilon\) for every case?
Global \(\epsilon\)!
No
Do we need a \(\epsilon\) for every case?
Global \(\epsilon\)!
No
If all the features is standardized to z-score, then a single \(\epsilon\) is enough.
i.e. \(\theta_i = \frac{t_i - u_i}{\sigma_i}\)
Perturb on one tree may invalid other trees
Perturb on one tree may invalid other trees
Have a round of check on the whole forest
Ad quality
Ad quality varies, for
Ad quality varies, for
Not serving them is not an option
Ad quality varies, for
Not serving them is not an option
Serving them hurts the user experience
Ad quality varies, for
Not serving them is not an option
Serving them hurts the user experience
As the
1. Feature ranking
ϵ = 0.05 and δ = cosine distance
2. Top K recommendations based on cost, rank them based on feature importance
ϵ = 0.05 and δ = cosine distance
3. Use user study to test the performance, result shows:
ϵ = 0.05 and δ = cosine distance
ϵ = 0.05 and δ = cosine distance
\( helpfulness(i) = \frac{helpful(i)}{helpful(i) + \neg helptul(i)} \)
Showcase of helpfulness for top features