Shen Shen
November 8, 2024
Recap: Supervised learning
"To date, the cleverest thinker of all time was Issac. "
feature
label
To date, the
cleverest
To date, the cleverest
thinker
To date, the cleverest thinker
was
To date, the cleverest thinker of all time was
Issac
Recap: Unsupervised/Self-supervised learning
Training Data
loss/objective
hypothesis class
A model
\(f\)
\(m<d\)
Recap: Unsupervised/Self-supervised learning
Food-truck placement
Food-truck placement
Food-truck placement
Food-truck placement
\(\sum_{j=1}^k \mathbf{1}\left\{y^{(i)}=j\right\}\left\|x^{(i)}-\mu^{(j)}\right\|_2^2\)
Food-truck placement
indicator function, 1 if person \(i\) is assigned to truck \(j,\) otherwise 0.
\( \sum_{j=1}^k \mathbf{1}\left\{y^{(i)}=j\right\}\left\|x^{(i)}-\mu^{(j)}\right\|_2^2\)
\(k\)-means objective
clustering membership
clustering centroid location
enumerates over cluster
enumerates over data
can switch the order = \(\sum_{j=1}^k \sum_{i=1}^n \mathbf{1}\left\{y^{(i)}=j\right\}\left\|x^{(i)}-\mu^{(j)}\right\|_2^2\)
what we learn
\(\sum_{i=1}^n\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
each person \(i\) gets assigned to food truck \(j\), color-coded.
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
\(\dots\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
\(N_j = \sum_{i=1}^n \mathbf{1}\left\{y^{(i)}=j\right\}\)
food truck \(j\) gets moved to the "central" location of all ppl assigned to it
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
\(\dots\)
3 \(y_{\text {old }} = y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
3 \(y_{\text {old }} = y\)
10 return \(\mu, y\)
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
3 \(y_{\text {old }} = y\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbf{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
Effect of initialization
Effect of initialization - one remedy:
Run random initializations multiple times,
Compare their \(k\)-means objective values, choose the lowest one
Effect of \(k\)
\(k\)-means
Compare to classification
Compare to classification
Compare to classification
...
[Slide Credit: Yann LeCun]
We'd love to hear your thoughts.
K-MEANS\((k, \tau, \left\{x^{(i)}\right\}_{i=1}^n)\)
1 \(\mu, y=\) random initialization
2 for \(t=1\) to \(\tau\)
3 \(y_{o l d}=y\)
4 for \(i=1\) to \(n\)
\(5 \quad \quad\quad\quad \quad y^{(i)}=\arg \min _j\left\|x^{(i)}-\mu^{(j)}\right\|^2\)
6 for \(j=1\) to \(k\)
\(7 \quad \quad\quad\quad \quad \mu^{(j)}=\frac{1}{N_j} \sum_{i=1}^n \mathbb{1}\left(y^{(i)}=\mathfrak{j}\right) x^{(i)}\)
8 if \(y==y_{\text {old }}\)
9\(\quad \quad \quad\quad \quad\)break
10 return \(\mu, y\)
Food distribution placement
(image credit: Tamara Broderick)