Shen Shen
April 12, 2024
(many slides adapted from Tamara Broderick)
Enduring principles:
Lessons from CNNs
CNN
命
運
我
操
縱
Transformers
Interpretability
are the classical examples of non-parametric models
features:
\(x_1\): date
\(x_2\): age
\(x_3\): height
\(x_4\): weight
\(x_5\): sinus tachycardia?
\(x_6\): min systolic bp, 24h
\(x_7\): latest diastolic bp
labels:
1: high risk
-1: low risk
Root node
Internal (decision) node
Leaf (terminal) node
Split dimension
Split value
A node can be specified by
Node(split dim, split value, left child, right child)
A leaf can be specified by
Leaf(leaf value)
features:
labels:
\(y\): km run
Tree defines an axis-aligned “partition” of the feature space:
How to learn a tree?
Recall: familiar "recipe"
Here, we need:
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
Suppose line 8 sets this \((j^*,s^*) = (1, 1.7)\)
then 12 recursion
\(\operatorname{BuildTree}(I;k)\)
Line 8 sets this \((j^*,s^*)\)
Line 12 recursion
\(\operatorname{BuildTree}(I;k)\)
Line 8 sets this \((j^*,s^*)\)
Line 12 recursion
\(\operatorname{BuildTree}(I;k)\)
Line 8 sets this \((j^*,s^*)\)
Line 12 recursion
\(\operatorname{BuildTree}(I;k)\)
Line 8 sets this \((j^*,s^*)\)
Line 12 recursion
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
\(\operatorname{BuildTree}(I;k)\)
The only change from regression to classification:
\(E_{j, s} = \frac{\left|I_{j, s}^{-}\right|}{|I|} \cdot H\left(I_{j, s}^{-}\right)+\frac{\left|I_{j, s}^{+}\right|}{|I|} \cdot H\left(I_{j, s}^{+}\right)\)
\(H\left(I_{j, s}^{-}\right) = -[\frac{3}{6} \log _2\left(\frac{3}{6}\right)+\frac{2}{6} \log _2\left(\frac{2}{6}\right)+\frac{1}{6} \log _2\left(\frac{1}{6}\right)]\)
\(H\left(I_{j, s}^{+}\right) = -[\frac{1}{3} \log \left(\frac{1}{3}\right)+\frac{0}{3} \log _2\left(\frac{0}{3}\right)+\frac{2}{3} \log _2\left(\frac{2}{3}\right)]\)
\(H=-\sum_{\text {class }_c} \hat{P}_c (\log _2 \hat{P}_c)\)
Bagging
Bagging
Training: None (or rather: memorize the entire training data)
Predicting/testing:
We'd love it for you to share some lecture feedback.