Intro to Machine Learning
data:image/s3,"s3://crabby-images/176c7/176c746469f89d0898597fd5e5b6ad2ebe8ef0ca" alt=""
data:image/s3,"s3://crabby-images/651e3/651e3e8b658a1bfd61c989e8335ec0810203d560" alt=""
Lecture 9: Non-parametric Models
Shen Shen
April 12, 2024
(many slides adapted from Tamara Broderick)
Outline
- Recap (transforermers)
- Non-parametric models
- interpretability
- ease of use/simplicity
- Decision tree
- Terminologies
- Learn via the BuildTree algorithm
- Regression
- Classification
- Nearest neighbor
Outline
- Recap (transforermers)
- Non-parametric models
- interpretability
- ease of use/simplicity
- Decision tree
- Terminologies
- Learn via the BuildTree algorithm
- Regression
- Classification
- Nearest neighbor
data:image/s3,"s3://crabby-images/46544/46544a02e988af0fa3fe77c72665554918db638f" alt=""
Enduring principles:
- Chop up signal into patches (divide and conquer)
- Process each patch identically (and in parallel)
Lessons from CNNs
data:image/s3,"s3://crabby-images/84ea8/84ea804032925fe5451fe83b063af3c87cef7c91" alt=""
CNN
- Importantly, all these learned projection weights W are shared along the token sequence.
- Same "operation" repeated.
命
data:image/s3,"s3://crabby-images/0229e/0229ea08bc6944e3c04d2742051d12606e56d39c" alt=""
data:image/s3,"s3://crabby-images/2d84a/2d84afb8ea84358d2c28d363b2a0269924eb9e59" alt=""
data:image/s3,"s3://crabby-images/3a77d/3a77d44673a16cd0b8df51eed555cc79d2edd40d" alt=""
data:image/s3,"s3://crabby-images/792e9/792e9013d97dd1802653a5c4fcd6d5febffbe05f" alt=""
data:image/s3,"s3://crabby-images/0229e/0229ea08bc6944e3c04d2742051d12606e56d39c" alt=""
data:image/s3,"s3://crabby-images/2d84a/2d84afb8ea84358d2c28d363b2a0269924eb9e59" alt=""
data:image/s3,"s3://crabby-images/3a77d/3a77d44673a16cd0b8df51eed555cc79d2edd40d" alt=""
data:image/s3,"s3://crabby-images/792e9/792e9013d97dd1802653a5c4fcd6d5febffbe05f" alt=""
運
我
data:image/s3,"s3://crabby-images/0229e/0229ea08bc6944e3c04d2742051d12606e56d39c" alt=""
data:image/s3,"s3://crabby-images/2d84a/2d84afb8ea84358d2c28d363b2a0269924eb9e59" alt=""
data:image/s3,"s3://crabby-images/3a77d/3a77d44673a16cd0b8df51eed555cc79d2edd40d" alt=""
data:image/s3,"s3://crabby-images/792e9/792e9013d97dd1802653a5c4fcd6d5febffbe05f" alt=""
操
data:image/s3,"s3://crabby-images/0229e/0229ea08bc6944e3c04d2742051d12606e56d39c" alt=""
data:image/s3,"s3://crabby-images/2d84a/2d84afb8ea84358d2c28d363b2a0269924eb9e59" alt=""
data:image/s3,"s3://crabby-images/3a77d/3a77d44673a16cd0b8df51eed555cc79d2edd40d" alt=""
data:image/s3,"s3://crabby-images/792e9/792e9013d97dd1802653a5c4fcd6d5febffbe05f" alt=""
縱
data:image/s3,"s3://crabby-images/0229e/0229ea08bc6944e3c04d2742051d12606e56d39c" alt=""
data:image/s3,"s3://crabby-images/2d84a/2d84afb8ea84358d2c28d363b2a0269924eb9e59" alt=""
data:image/s3,"s3://crabby-images/3a77d/3a77d44673a16cd0b8df51eed555cc79d2edd40d" alt=""
data:image/s3,"s3://crabby-images/792e9/792e9013d97dd1802653a5c4fcd6d5febffbe05f" alt=""
Transformers
data:image/s3,"s3://crabby-images/6ebe9/6ebe9e38181dc59d0e677d5307674bc4de5a4aa7" alt=""
data:image/s3,"s3://crabby-images/19371/19371fe531a5711de63ddc5bc9ececb74f5a9198" alt=""
Interpretability
Outline
- Recap (transforermers)
- Non-parametric models
- interpretability
- ease of use/simplicity
- Decision tree
- Terminologies
- Learn via the BuildTree algorithm
- Regression
- Classification
- Nearest neighbor
- does not mean "no parameters"
- there are still parameters to be learned to build a hypothesis/model.
- just that, the model/hypothesis does not have a fixed parameterization.
- (e.g. even the number of parameters can change.)
Non-parametric models
- Decision trees and
- Nearest neighbor
are the classical examples of non-parametric models
Outline
- Recap (transforermers)
- Non-parametric models
- interpretability
- ease of use/simplicity
- Decision tree
- Terminologies
- Learn via the BuildTree algorithm
- Regression
- Classification
- Nearest neighbor
data:image/s3,"s3://crabby-images/40db1/40db125ffa915594554797ffce7f40948ed69f8a" alt=""
features:
x1: date
x2: age
x3: height
x4: weight
x5: sinus tachycardia?
x6: min systolic bp, 24h
x7: latest diastolic bp
labels:
1: high risk
-1: low risk
data:image/s3,"s3://crabby-images/ce0d2/ce0d2fa728e5ec4d452cc3150dd93c9810117149" alt=""
Root node
Internal (decision) node
Leaf (terminal) node
data:image/s3,"s3://crabby-images/ce0d2/ce0d2fa728e5ec4d452cc3150dd93c9810117149" alt=""
Split dimension
Split value
A node can be specified by
Node(split dim, split value, left child, right child)
data:image/s3,"s3://crabby-images/ce0d2/ce0d2fa728e5ec4d452cc3150dd93c9810117149" alt=""
A leaf can be specified by
Leaf(leaf value)
features:
- x1: temperature (deg C)
- x2: precipitation (cm/hr)
labels:
y: km run
Tree defines an axis-aligned “partition” of the feature space:
data:image/s3,"s3://crabby-images/979bd/979bde7c0640bd7228a440edd6a64997dfa2faa0" alt=""
data:image/s3,"s3://crabby-images/65a11/65a11cf167d22a661f5e79e587e882e5e26be658" alt=""
data:image/s3,"s3://crabby-images/e33e6/e33e667bf7fc33f874793d0d5daf571dd1603be2" alt=""
data:image/s3,"s3://crabby-images/71496/71496d174579f0bcb3e25b19507fc66dfc5252c1" alt=""
data:image/s3,"s3://crabby-images/8d9c7/8d9c7fee7da26a423d9f699db4455882cc3c3a7a" alt=""
data:image/s3,"s3://crabby-images/29d55/29d55e9c1f62404d11c6a815d768c08d6f224832" alt=""
data:image/s3,"s3://crabby-images/8cd55/8cd55855930d426234874e4a186a0f054e30323f" alt=""
data:image/s3,"s3://crabby-images/06e80/06e8007b807b399a50a8100dbd76e925c2e4a71a" alt=""
data:image/s3,"s3://crabby-images/dd0c1/dd0c1a8eb854d4359ebf624dff1ca3eebd135f71" alt=""
data:image/s3,"s3://crabby-images/5311c/5311ccc7401907d0701c10820b7132159e0c8f21" alt=""
data:image/s3,"s3://crabby-images/d116d/d116de73a7d24cc3d3e4c1e7a06bd475504c3816" alt=""
data:image/s3,"s3://crabby-images/d116d/d116de73a7d24cc3d3e4c1e7a06bd475504c3816" alt=""
How to learn a tree?
data:image/s3,"s3://crabby-images/62a58/62a58f9dd14b9b1166e3cc96b6efda2521427a63" alt=""
Recall: familiar "recipe"
- Choose how to predict label (given features & parameters)
- Choose a loss (between guess & actual label)
- Choose parameters by trying to minimize the training loss
Here, we need:
- For each internal node:
- split dimension
- split value
- child nodes
- For each leaf node:
- label
- input I: set of indices
- k: hyper-parameter, maximum leaf "size", i.e. how many training data ended in that leaf node.
- y^: (intermediate) prediction
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
- j: split dimension
- s: split value
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
- Choose k=2
- BuildTree({1,2,3};2)
- Line 1 true
- Consider a fixed (j,s)
- Ij,s+={2,3}
- Ij,s−={1}
- y^j,s+=5
- y^j,s−=0
- Ej,s=0
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
- Choose k=2
- BuildTree({1,2,3};2)
- Line 1 true
- Consider a fixed (j,s)
- Ij,s+={2,3}
- Ij,s−={1}
- y^j,s+=5
- y^j,s−=0
- Ej,s=0
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
- So for line 2: a finite number of (j,s) combo suffices (those splits in-between data points)
- Line 8 picks the "best" among these finite combos. (random tie-breaking)
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
Suppose line 8 sets this (j∗,s∗)=(1,1.7)
data:image/s3,"s3://crabby-images/17ea1/17ea1ae4d163f796dff080ce676d246469ba1df6" alt=""
then 12 recursion
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
Line 8 sets this (j∗,s∗)
data:image/s3,"s3://crabby-images/17ea1/17ea1ae4d163f796dff080ce676d246469ba1df6" alt=""
Line 12 recursion
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
Line 8 sets this (j∗,s∗)
data:image/s3,"s3://crabby-images/17ea1/17ea1ae4d163f796dff080ce676d246469ba1df6" alt=""
Line 12 recursion
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/6c2ec/6c2ecbf3998f535debbbfabe757a300ca1798edd" alt=""
data:image/s3,"s3://crabby-images/a4d05/a4d058bd00af5a957afc8459d8771724748d5400" alt=""
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
Line 8 sets this (j∗,s∗)
data:image/s3,"s3://crabby-images/17ea1/17ea1ae4d163f796dff080ce676d246469ba1df6" alt=""
Line 12 recursion
data:image/s3,"s3://crabby-images/a4d05/a4d058bd00af5a957afc8459d8771724748d5400" alt=""
data:image/s3,"s3://crabby-images/84560/845605756bce96be5d36ab89721449ab179aa835" alt=""
data:image/s3,"s3://crabby-images/84560/845605756bce96be5d36ab89721449ab179aa835" alt=""
data:image/s3,"s3://crabby-images/84560/845605756bce96be5d36ab89721449ab179aa835" alt=""
data:image/s3,"s3://crabby-images/582c5/582c5a7991846564b97082aebc010eae61c52f10" alt=""
data:image/s3,"s3://crabby-images/65b4a/65b4a19d7ed435ebd73991b55c04f9a2d40e2d86" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
Line 8 sets this (j∗,s∗)
data:image/s3,"s3://crabby-images/17ea1/17ea1ae4d163f796dff080ce676d246469ba1df6" alt=""
Line 12 recursion
data:image/s3,"s3://crabby-images/a4d05/a4d058bd00af5a957afc8459d8771724748d5400" alt=""
data:image/s3,"s3://crabby-images/84560/845605756bce96be5d36ab89721449ab179aa835" alt=""
data:image/s3,"s3://crabby-images/84560/845605756bce96be5d36ab89721449ab179aa835" alt=""
data:image/s3,"s3://crabby-images/75ff5/75ff5f6f6708e58023fa4b080c9103326137526b" alt=""
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/600c0/600c03ec2721e025bb7e1a5c89d3bd26ee0885e2" alt=""
data:image/s3,"s3://crabby-images/89299/89299764757020bf24dc54ed43800ff85a3630b6" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/600c0/600c03ec2721e025bb7e1a5c89d3bd26ee0885e2" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/89299/89299764757020bf24dc54ed43800ff85a3630b6" alt=""
data:image/s3,"s3://crabby-images/a6639/a66399db0d8ac0ea1dd61bb5b60fef32587f44c7" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/600c0/600c03ec2721e025bb7e1a5c89d3bd26ee0885e2" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/cf20f/cf20f203410720ebdf1323c4491ac1e624e9fc22" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/600c0/600c03ec2721e025bb7e1a5c89d3bd26ee0885e2" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/67468/6746837cea60b4409aa1b505c5b32a391e40b757" alt=""
data:image/s3,"s3://crabby-images/75dff/75dfffddec33bdce462476e04b8e4eb7e99ea54c" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/67468/6746837cea60b4409aa1b505c5b32a391e40b757" alt=""
data:image/s3,"s3://crabby-images/f32e0/f32e01e72896b17581df6ca298f8791ee591fb77" alt=""
data:image/s3,"s3://crabby-images/f10a2/f10a2f96cb0071160c07181aed80499fa70167eb" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/f32e0/f32e01e72896b17581df6ca298f8791ee591fb77" alt=""
data:image/s3,"s3://crabby-images/f10a2/f10a2f96cb0071160c07181aed80499fa70167eb" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/65fcb/65fcbd841384cf289d0b7ab4bcf9aa1213ed1239" alt=""
data:image/s3,"s3://crabby-images/67468/6746837cea60b4409aa1b505c5b32a391e40b757" alt=""
data:image/s3,"s3://crabby-images/20d2d/20d2de69b353e2e496d44c6d9061e8872724aa39" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/65fcb/65fcbd841384cf289d0b7ab4bcf9aa1213ed1239" alt=""
data:image/s3,"s3://crabby-images/22ff6/22ff6837378b3c3cd0ff211d7b24dee4d747d438" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/91de0/91de00ebd8d4a3a7f5ddb9ad56b63f2bd6bd6fe5" alt=""
data:image/s3,"s3://crabby-images/86cd4/86cd4c9b03b252fc7864f618f1a2ca058f02e11f" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/042e1/042e181bb6d14b3ba81d9ce08ecba038744bdf49" alt=""
data:image/s3,"s3://crabby-images/d2a92/d2a92bc8b835cdab19755095f9cefa3b8c4f1be0" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/d2a92/d2a92bc8b835cdab19755095f9cefa3b8c4f1be0" alt=""
data:image/s3,"s3://crabby-images/d55ce/d55ce64c00275c6f426846e4241a4c28bab59282" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/d2a92/d2a92bc8b835cdab19755095f9cefa3b8c4f1be0" alt=""
data:image/s3,"s3://crabby-images/1522a/1522ab33c8febe9fc0e9e0e583247cedc192f3db" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set. y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/d2a92/d2a92bc8b835cdab19755095f9cefa3b8c4f1be0" alt=""
data:image/s3,"s3://crabby-images/63cbd/63cbd9d08f8c3fe854c9faaa2959dc2574bda761" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/d2a92/d2a92bc8b835cdab19755095f9cefa3b8c4f1be0" alt=""
data:image/s3,"s3://crabby-images/63cbd/63cbd9d08f8c3fe854c9faaa2959dc2574bda761" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+=average i∈Ij,s+ y(i)
- Set y^j,s−=average i∈Ij,s− y(i)
- Set Ej,s=∑i∈Ij,s+(y(i)−y^j,s+)2+∑i∈Ij,s−(y(i)−y^j,s−)2
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= average i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−,k),BuildTree(Ij∗,s∗+,k))
data:image/s3,"s3://crabby-images/61e3f/61e3fd05cc7b9ea6104e9ec1c4437aec3ea4ecd0" alt=""
data:image/s3,"s3://crabby-images/016b3/016b365a04425c370ac24579cfcd988d4bde40c1" alt=""
data:image/s3,"s3://crabby-images/b7a7c/b7a7ca79ec92bf28d6a9933f07785b82013766e5" alt=""
data:image/s3,"s3://crabby-images/c11fa/c11fa537238ae94e21bc79ee6d06b9ee19ff3925" alt=""
BuildTree(I;k)
- if ∣I∣>k
- for each split dim j and split value s
- Set Ij,s+={i∈I∣xj(i)≥s}
- Set Ij,s−={i∈I∣xj(i)<s}
- Set y^j,s+= majority i∈Ij,s+ y(i)
- Set y^j,s−= majority i∈Ij,s− y(i)
- Set Ej,s=∣I∣∣Ij,s−∣⋅H(Ij,s−)+∣I∣∣Ij,s+∣⋅H(Ij,s+)
- Set (j∗,s∗)=argminj,sEj,s
- else
- Set y^= majority i∈I y(i)
- return LEAF(leave_value=y^)
- return Node(j∗,s∗,BuildTree(Ij∗,s∗−;k),BuildTree(Ij∗,s∗+;k))
The only change from regression to classification:
- Line 5, 6, 10, average becomes majority vote
- Line 7 error more involved
Ej,s=∣I∣∣Ij,s−∣⋅H(Ij,s−)+∣I∣∣Ij,s+∣⋅H(Ij,s+)
- I = 9, Ij,s− = 6, Ij,s+ = 3
- So, Ej,s=96H(Ij,s−)+93 H(Ij,s−)
H(Ij,s−)=−[63log2(63)+62log2(62)+61log2(61)]
H(Ij,s+)=−[31 log(31)+30log2(30)+32log2(32)]
H=−∑class cP^c(log2P^c)
data:image/s3,"s3://crabby-images/383f1/383f126583d1cdd8631231875d4a440b0ee89b29" alt=""
data:image/s3,"s3://crabby-images/ef14e/ef14e26a7b57ded4cb23d258f73eae02932cae01" alt=""
data:image/s3,"s3://crabby-images/6faaa/6faaa4482a4dd811bbe6dd508ff93c004ab434c0" alt=""
data:image/s3,"s3://crabby-images/a51d2/a51d276ed412194634b3bfc314e09d90ebacda03" alt=""
data:image/s3,"s3://crabby-images/5315a/5315a43a880a04c6efa22b6e82472168a27cdad9" alt=""
- One of multiple ways to make and use an ensemble
- Bagging = Bootstrap aggregating
- Training data Dn
Bagging
data:image/s3,"s3://crabby-images/72fe7/72fe72442756071e235afe241cd72f65c86e2cf8" alt=""
data:image/s3,"s3://crabby-images/b1ea7/b1ea7fcc784948f162b283f54508630147ffdb69" alt=""
- One of multiple ways to make and use an ensemble
- Bagging = Bootstrap aggregating
- Training data Dn
- For b=1,…,B
- Draw a new "data set" D~n(b) of size n by sampling with replacement from Dn
- Train a predictor f^(b) on D~n(b)
- Return
- For regression: f^bag (x)=B1∑b=1Bf^(b)(x)
- For classification: predictor at a point is class with highest vote count at that point
Bagging
data:image/s3,"s3://crabby-images/2307d/2307ded44acfdd7b077f277bcc629c2a5cf545d5" alt=""
Outline
- Recap (transforermers)
- Non-parametric models
- interpretability
- ease of use/simplicity
- Decision tree
- Terminologies
- Learn via the BuildTree algorithm
- Regression
- Classification
- Nearest neighbor
Nearest neighbor classifier
Training: None (or rather: memorize the entire training data)
Predicting/testing:
- for a new data point xnew do:
- find the k points in training data nearest to xnew
- For classification: predict label ynew^ for xnew by taking a majority vote of the k neighbors's labels y
- For regression: predict label ynew^ for xnew by taking an average over the k neighbors' labels y
- find the k points in training data nearest to xnew
- Hyperparameter: k
- Also need
- Distance metric (typically Euclidean or Manhattan distance)
- A tie-breaking scheme (typically at random)
data:image/s3,"s3://crabby-images/a565a/a565a66dc59dc5ebe9e1949302f137ca1b524f4d" alt=""
data:image/s3,"s3://crabby-images/b132c/b132c304d6e39e5297ea37a606a305fdcd44d506" alt=""
data:image/s3,"s3://crabby-images/11da2/11da2a134233642161de825f15f3173e3d45e757" alt=""
data:image/s3,"s3://crabby-images/34ff7/34ff7fff7b1efc175118ac155c1e3685c9d60834" alt=""
Thanks!
We'd love it for you to share some lecture feedback.
introml-sp24-lec9
By Shen Shen
introml-sp24-lec9
- 127