data:image/s3,"s3://crabby-images/8929d/8929dac9715e822133ff907fcecbc83e46e8873e" alt=""
Lecture 7: Neural Networks II, Auto-encoders
Shen Shen
October 11, 2024
Intro to Machine Learning
data:image/s3,"s3://crabby-images/ff589/ff58933fbef3ae198397d37ec365570511123c0e" alt=""
(slides adapted from Phillip Isola)
Outline
- Recap, neural networks mechanism
- Neural networks are representation learners
- Auto-encoder:
- Bottleneck
- Reconstruction
- Unsupervised learning
- (Some recent representation learning ideas)
linear combination
nonlinear activation
\(\dots\)
Forward pass: evaluate, given the current parameters,
- the model output \(g^{(i)}\) =
- the loss incurred on the current data \(\mathcal{L}(g^{(i)}, y^{(i)})\)
- the training error \(J = \frac{1}{n} \sum_{i=1}^{n}\mathcal{L}(g^{(i)}, y^{(i)})\)
loss function
Recap:
compositions of ReLU(s) can be quite expressive
data:image/s3,"s3://crabby-images/d494a/d494a0107bb267e5e3cf9b67ec420fc880aaf7a3" alt=""
data:image/s3,"s3://crabby-images/d494a/d494a0107bb267e5e3cf9b67ec420fc880aaf7a3" alt=""
in fact, asymptotically, can approximate any function!
data:image/s3,"s3://crabby-images/84400/84400f7ce0a839b8c1f500d86d06b9b37127de56" alt=""
(image credit: Phillip Isola)
data:image/s3,"s3://crabby-images/439f2/439f2a69d0a307311744b399951e02d069752cff" alt=""
data:image/s3,"s3://crabby-images/56efb/56efbd6630366b325452296b59e1d7c4c9c52b94" alt=""
data:image/s3,"s3://crabby-images/a01e8/a01e8537658f82f6fa5963c39a260ef798f6609d" alt=""
some weighted sum
Recap:
data:image/s3,"s3://crabby-images/aba61/aba61ded5579a3516c5ffe2d255bf5ff5a9a8a50" alt=""
- Randomly pick a data point \((x^{(i)}, y^{(i)})\)
- Evaluate the gradient \(\nabla_{W^2} \mathcal{L(g^{(i)},y^{(i)})}\)
- Update the weights \(W^2 \leftarrow W^2 - \eta \nabla_{W^2} \mathcal{L(g^{(i)},y^{(i)}})\)
\(\dots\)
Backward pass: run SGD to update the parameters, e.g. to update \(W^2\)
\(\nabla_{W^2} \mathcal{L(g^{(i)},y^{(i)})}\)
Recap:
\(\dots\)
Recap:
back propagation: reuse of computation
\(\dots\)
back propagation: reuse of computation
Recap:
Outline
- Recap, neural networks mechanism
- Neural networks are representation learners
- Auto-encoder:
- Bottleneck
- Reconstruction
- Unsupervised learning
- (Some recent representation learning ideas)
data:image/s3,"s3://crabby-images/aa82b/aa82b7d9bb21a408d0ced236700ba8eb23f8547b" alt=""
Two different ways to visualize a function
data:image/s3,"s3://crabby-images/0a0e0/0a0e0d81ee5544cd21737833cfbc4c860af43aa9" alt=""
data:image/s3,"s3://crabby-images/39139/39139beca62ad442e696b9e5925f540310c977ca" alt=""
Two different ways to visualize a function
data:image/s3,"s3://crabby-images/04bc7/04bc771f367e5e1c37a6555961408091113b88c3" alt=""
Representation transformations for a variety of neural net operations
and stack of neural net operations
data:image/s3,"s3://crabby-images/867a2/867a25081635fe720cd03562b5af5506b850e84b" alt=""
data:image/s3,"s3://crabby-images/ceacd/ceacdde2f4392d315390832cc2210e1d67b82e95" alt=""
data:image/s3,"s3://crabby-images/755f7/755f772c1b2cf628139ee45d29ec53c277a39664" alt=""
data:image/s3,"s3://crabby-images/1db63/1db6347e2a2b209a27977b91e7ef11c7e17d49dd" alt=""
data:image/s3,"s3://crabby-images/ffd11/ffd11d23a4f659cf38a10586cd2f8caf734390ce" alt=""
data:image/s3,"s3://crabby-images/52b4f/52b4fff8818f9f7497590e6b1e4dfc6c818b065e" alt=""
wiring graph
equation
mapping 1D
mapping 2D
data:image/s3,"s3://crabby-images/d0b94/d0b948f0e5db23d57d9584adc1c6de8350b98c1a" alt=""
data:image/s3,"s3://crabby-images/cb6e0/cb6e07da532ea1be1b7db6991a516aeeea8f54d4" alt=""
data:image/s3,"s3://crabby-images/42d10/42d104b0506446a7b8d7528be2909194600f9c12" alt=""
data:image/s3,"s3://crabby-images/6cad0/6cad0fb23cd068fe338b1bca7f0709f1bd27e557" alt=""
data:image/s3,"s3://crabby-images/624ba/624ba4ebcd69ed5a4a19e419c4ca7122269cab1c" alt=""
data:image/s3,"s3://crabby-images/2d537/2d53768ed9207291899b7fe2f4c0b55220433388" alt=""
data:image/s3,"s3://crabby-images/95582/9558262c995fb5d5c781009182934c28462d58df" alt=""
data:image/s3,"s3://crabby-images/09a49/09a490f9726a993846fe264fc957998da1d14cba" alt=""
data:image/s3,"s3://crabby-images/09a49/09a490f9726a993846fe264fc957998da1d14cba" alt=""
Training data
data:image/s3,"s3://crabby-images/8ce6b/8ce6b93697ede0965ca4c0d648164a78412600ef" alt=""
data:image/s3,"s3://crabby-images/2e7a5/2e7a567a2bf48f9c0dfaed0edda9f45368c8b626" alt=""
data:image/s3,"s3://crabby-images/ed858/ed858078da4e0621a3e37a67ea4f1aa0d32a3568" alt=""
data:image/s3,"s3://crabby-images/1988d/1988de039a793358d616de3cb6fa09fc81fffd04" alt=""
maps from complex data space to simple embedding space
data:image/s3,"s3://crabby-images/aecf7/aecf75af9c9b309af6a72a0506867512d99e714f" alt=""
data:image/s3,"s3://crabby-images/3ee75/3ee753ba07648db3d00ff8b95edb2438bb47494b" alt=""
Neural networks are representation learners
Deep nets transform datapoints, layer by layer
Each layer gives a different representation (aka embedding) of the data
data:image/s3,"s3://crabby-images/94dae/94dae509695d4c00d9a2de26fbf54f1c768c5764" alt=""
data:image/s3,"s3://crabby-images/5b99b/5b99b17081aa3fac99f681c1af4207681d6f9dcd" alt=""
🧠
humans also learn representations
data:image/s3,"s3://crabby-images/d90b2/d90b2e4fa58fa84a27c90cd70d62f5d2beb1649b" alt=""
"I stand at the window and see a house, trees, sky. Theoretically I might say there were 327 brightnesses and nuances of colour. Do I have "327"? No. I have sky, house, and trees.”
— Max Wertheimer, 1923
data:image/s3,"s3://crabby-images/5b99b/5b99b17081aa3fac99f681c1af4207681d6f9dcd" alt=""
Good representations are:
- Compact (minimal)
- Explanatory (roughly sufficient)
[See “Representation Learning”, Bengio 2013, for more commentary]
data:image/s3,"s3://crabby-images/bdde1/bdde119ff3ab959ef1633496d963b8ce5eb6b9de" alt=""
data:image/s3,"s3://crabby-images/7c3e8/7c3e8b6f23f3166e8894a40d95ffcc7c446fea62" alt=""
[Bartlett, 1932]
[Intraub & Richardson, 1989]
data:image/s3,"s3://crabby-images/7c3e8/7c3e8b6f23f3166e8894a40d95ffcc7c446fea62" alt=""
data:image/s3,"s3://crabby-images/ad0f7/ad0f7fd27e9f1d3956b390dbd26a7d23a0a07358" alt=""
[https://www.behance.net/gallery/35437979/Velocipedia]
data:image/s3,"s3://crabby-images/734b1/734b10d45377fdf3e8673c483bf578185dc0c479" alt=""
data:image/s3,"s3://crabby-images/fcebf/fcebf152b8e11d805fc390bdb831aa2869ca881a" alt=""
data:image/s3,"s3://crabby-images/4d0d9/4d0d91cdf20ccf571720902a0a6188a9c5f988ac" alt=""
data:image/s3,"s3://crabby-images/40354/40354c9650b3f49ccf7b44be87d71530ebeac88e" alt=""
data:image/s3,"s3://crabby-images/bafe8/bafe8dadd6fa7fde184c9a2f0a74005814377229" alt=""
Outline
- Recap, neural networks mechanism
- Neural networks are representation learners
-
Auto-encoder:
- Bottleneck
- Reconstruction
- Unsupervised learning
- (Some recent representation learning ideas)
data:image/s3,"s3://crabby-images/5b99b/5b99b17081aa3fac99f681c1af4207681d6f9dcd" alt=""
- Compact (minimal)
- Explanatory (roughly sufficient)
- Disentangled (independent factors)
- Interpretable
- Make subsequent problem solving easy
[See “Representation Learning”, Bengio 2013, for more commentary]
Auto-encoders try to achieve these
these may just emerge as well
Good representations are:
data:image/s3,"s3://crabby-images/453de/453de67307b242528aebb2e81edfccb89cecb07d" alt=""
compact representation/embedding
Auto-encoder
data:image/s3,"s3://crabby-images/453de/453de67307b242528aebb2e81edfccb89cecb07d" alt=""
data:image/s3,"s3://crabby-images/16345/16345cf584a373844e7fdedc4a1adfac31598775" alt=""
Auto-encoder
"What I cannot create, I do not understand." Feynman
data:image/s3,"s3://crabby-images/453de/453de67307b242528aebb2e81edfccb89cecb07d" alt=""
data:image/s3,"s3://crabby-images/16345/16345cf584a373844e7fdedc4a1adfac31598775" alt=""
Auto-encoder
encoder
decoder
bottleneck
Auto-encoder
data:image/s3,"s3://crabby-images/c719b/c719bc9c267e0902223fe430b2e9ecca579bde96" alt=""
input \(x \in \mathbb{R^d}\)
output \(\tilde{x} \in \mathbb{R^d}\)
bottleneck
typically, has lower dimension than \(d\)
Auto-encoder
Training Data
loss/objective
hypothesis class
A model
\(f\)
data:image/s3,"s3://crabby-images/aac54/aac54f9940b24b11d1306d91e641c97d35a3484f" alt=""
\(m<d\)
data:image/s3,"s3://crabby-images/109f7/109f7cbe4d3140d64c1c4aa2457caad10f084953" alt=""
\(f: X \rightarrow Y\)
Supervised Learning
data:image/s3,"s3://crabby-images/d33f1/d33f12079c1db9f82f2ed422a3afff0dc0c84a34" alt=""
data:image/s3,"s3://crabby-images/d33f1/d33f12079c1db9f82f2ed422a3afff0dc0c84a34" alt=""
data:image/s3,"s3://crabby-images/b2bb8/b2bb8e97fc914f85882ce760fbb5f82d1f9c34d8" alt=""
"Good"
Representation
Unsupervised Learning
Training Data
data:image/s3,"s3://crabby-images/d5d49/d5d49ddd58d3988ec14bb70323139ce9cae0ec5d" alt=""
Word2Vec
https://www.tensorflow.org/text/tutorials/word2vec
data:image/s3,"s3://crabby-images/672f7/672f727e3a556261880218816d33f364585c5bcf" alt=""
data:image/s3,"s3://crabby-images/93683/93683bf64f8b2e3cb3d08596ed53e45c6a3745e0" alt=""
Word2Vec
data:image/s3,"s3://crabby-images/b452e/b452e5ed29b2c1dc1570826373b4da35d4eff73c" alt=""
verb tense
gender
X = Vector(“Paris”) – vector(“France”) + vector(“Italy”) \(\approx\) vector("Rome")
“Meaning is use” — Wittgenstein
data:image/s3,"s3://crabby-images/3ca46/3ca46f29bfc4e1a32028025ccde0ab168d3b1ecb" alt=""
Can help downstream tasks:
- sentiment analysis
- machine translation
- info retrieval
data:image/s3,"s3://crabby-images/b7d37/b7d3704da5d4e53d3870d3caaa7184df86ee61e9" alt=""
Often, what we will be “tested” on is not what we were trained on.
data:image/s3,"s3://crabby-images/c1533/c153387fb06fd90136102be87e3b59b6569b0087" alt=""
Final-layer adaptation: freeze \(f\), train a new final layer to new target data
data:image/s3,"s3://crabby-images/51106/51106777c5b526fcb1cacdbc5ba4391c89a97c18" alt=""
Finetuning: initialize \(f’\) as \(f\), then continue training for \(f'\) as well, on new target data
data:image/s3,"s3://crabby-images/5c2cd/5c2cd7da33df5e1176e479056e65041673322276" alt=""
Outline
- Recap, neural networks mechanism
- Neural networks are representation learners
- Auto-encoder:
- Bottleneck
- Reconstruction
- Unsupervised learning
- (Some recent representation learning ideas)
data:image/s3,"s3://crabby-images/3fa34/3fa346e0dc8fe8c39df26a409c05ea7a094ecee9" alt=""
data:image/s3,"s3://crabby-images/b8ff1/b8ff186c81ed9cf9c649dc22ec450acd4c9f8255" alt=""
Feature reconstruction (unsupervised learning)
Features
Reconstructed Features
data:image/s3,"s3://crabby-images/3fa34/3fa346e0dc8fe8c39df26a409c05ea7a094ecee9" alt=""
data:image/s3,"s3://crabby-images/aeaef/aeaef0c4a2e1554129afdab4c2b7bd71a345493d" alt=""
Label prediction (supervised learning)
Features
Label
data:image/s3,"s3://crabby-images/de09c/de09c8bf4c3ca28e2b0142c7047d797ee322dac7" alt=""
Partial
features
Other partial
features
data:image/s3,"s3://crabby-images/47ef3/47ef309be95931744929904867e8677d677ce6d4" alt=""
Masked Auto-encoder
[He, Chen, Xie, et al. 2021]
Masked Auto-encoder
[Devlin, Chang, Lee, et al. 2019]
data:image/s3,"s3://crabby-images/62771/62771398b5cc0116722c3bb2d1f2b6746ee1c9f0" alt=""
data:image/s3,"s3://crabby-images/ee541/ee54180480fad8528cbe8add42f683adaf7a7bd9" alt=""
data:image/s3,"s3://crabby-images/478b6/478b6a2b4500b9c2c7f46fbf5790eb89a46e62b9" alt=""
data:image/s3,"s3://crabby-images/ec80a/ec80aa69d6f9e3506e0072df9d56485cc4ce21f4" alt=""
[Zhang, Isola, Efros, ECCV 2016]
data:image/s3,"s3://crabby-images/478b6/478b6a2b4500b9c2c7f46fbf5790eb89a46e62b9" alt=""
data:image/s3,"s3://crabby-images/ec80a/ec80aa69d6f9e3506e0072df9d56485cc4ce21f4" alt=""
data:image/s3,"s3://crabby-images/74016/74016a721aa36e94645c49aa312c7356a2ce93fe" alt=""
predict color from gray-scale
[Zhang, Isola, Efros, ECCV 2016]
data:image/s3,"s3://crabby-images/14be1/14be179ab7e6069950e63f904833a53148639e39" alt=""
Self-supervised learning
Common trick:
- Convert “unsupervised” problem into “supervised” setup
- Do so by cooking up “labels” (prediction targets) from the raw data itself — called pretext task
data:image/s3,"s3://crabby-images/da3ca/da3cad30d4f7db9a73ada36addb3c8bf3e0866f1" alt=""
data:image/s3,"s3://crabby-images/8cb47/8cb47e486e1d206dfa4a367aa16a6408237db34c" alt=""
data:image/s3,"s3://crabby-images/4b577/4b577c84c90ba815ab97dee74ba4fee0a82d9268" alt=""
data:image/s3,"s3://crabby-images/02587/02587a4b0ad131c47305e67595fb2ae118025a5b" alt=""
data:image/s3,"s3://crabby-images/18df2/18df23582039410cbc238549b0cc7942f684f8fc" alt=""
data:image/s3,"s3://crabby-images/a13b3/a13b3f9bfb1a5e31346781f94ee393607b81c386" alt=""
The allegory of the cave
data:image/s3,"s3://crabby-images/59f66/59f660226fe17af0ffcd74302c017a68cf0ee1ee" alt=""
data:image/s3,"s3://crabby-images/37b30/37b308768389ed9fcbcaa681a4f91a58fd61faa5" alt=""
data:image/s3,"s3://crabby-images/f66e0/f66e0fd6b6bdce400cdb388d9f309a695f990338" alt=""
data:image/s3,"s3://crabby-images/81a2b/81a2b783c5a9b1e5124dde6b65b4d2b913e01451" alt=""
data:image/s3,"s3://crabby-images/c906e/c906e6ae6d311f731858def466568eb64f3deda2" alt=""
data:image/s3,"s3://crabby-images/85edc/85edcbaba30e95d0d216c3f37d0282037ebc2d9e" alt=""
data:image/s3,"s3://crabby-images/360d4/360d40935961602864377f18b68e2825f19662f8" alt=""
data:image/s3,"s3://crabby-images/871e8/871e8ac50709917041ae6a2438c91e86e236b3ea" alt=""
data:image/s3,"s3://crabby-images/8d75a/8d75a8dc75738abac458921c0e3117134bccde78" alt=""
[Slide credit: Andrew Owens]
[Owens et al, Ambient Sound Provides Supervision for Visual Learning, ECCV 2016]
[Slide credit: Andrew Owens]
[Owens et al, Ambient Sound Provides Supervision for Visual Learning, ECCV 2016]
data:image/s3,"s3://crabby-images/cb320/cb32048c1266586c05a631bddb0740754dc163c9" alt=""
What did the model learn?
data:image/s3,"s3://crabby-images/18736/18736f7acd08aa1ff1e19db1562e3916703a93e0" alt=""
[Slide credit: Andrew Owens]
[Owens et al, Ambient Sound Provides Supervision for Visual Learning, ECCV 2016]
data:image/s3,"s3://crabby-images/f7772/f7772ea120608ea44ab3dc64b86748141c7323a3" alt=""
[Slide Credit: Yann LeCun]
data:image/s3,"s3://crabby-images/59f66/59f660226fe17af0ffcd74302c017a68cf0ee1ee" alt=""
data:image/s3,"s3://crabby-images/37b30/37b308768389ed9fcbcaa681a4f91a58fd61faa5" alt=""
data:image/s3,"s3://crabby-images/f66e0/f66e0fd6b6bdce400cdb388d9f309a695f990338" alt=""
data:image/s3,"s3://crabby-images/81a2b/81a2b783c5a9b1e5124dde6b65b4d2b913e01451" alt=""
data:image/s3,"s3://crabby-images/c906e/c906e6ae6d311f731858def466568eb64f3deda2" alt=""
data:image/s3,"s3://crabby-images/85edc/85edcbaba30e95d0d216c3f37d0282037ebc2d9e" alt=""
data:image/s3,"s3://crabby-images/59f66/59f660226fe17af0ffcd74302c017a68cf0ee1ee" alt=""
data:image/s3,"s3://crabby-images/f66e0/f66e0fd6b6bdce400cdb388d9f309a695f990338" alt=""
data:image/s3,"s3://crabby-images/37b30/37b308768389ed9fcbcaa681a4f91a58fd61faa5" alt=""
data:image/s3,"s3://crabby-images/81a2b/81a2b783c5a9b1e5124dde6b65b4d2b913e01451" alt=""
data:image/s3,"s3://crabby-images/c906e/c906e6ae6d311f731858def466568eb64f3deda2" alt=""
data:image/s3,"s3://crabby-images/687ca/687ca387ce88f24fead6ff51a36d6ac29a114192" alt=""
Contrastive learning
data:image/s3,"s3://crabby-images/942ac/942ac773083d37820fd1e523a27d49ed1c1aa378" alt=""
Contrastive learning
data:image/s3,"s3://crabby-images/f0e5a/f0e5a870bae68657af895def2296a73f101b5bef" alt=""
data:image/s3,"s3://crabby-images/d9a7c/d9a7c5c5eadd1f105ba567f4327c3d68b6f77716" alt=""
[Chen, Kornblith, Norouzi, Hinton, ICML 2020]
[https://arxiv.org/pdf/2204.06125.pdf]
DallE
data:image/s3,"s3://crabby-images/ce7ac/ce7ac73ef7c3e712cdffa4a51c24863a0f9a1b2f" alt=""
Summary
- We looked at the mechanics of neural net last time. Today we see deep nets learn representations, just like our brains do.
- This is useful because representations transfer — they act as prior knowledge that enables quick learning on new tasks.
- Representations can also be learned without labels, e.g. as we do in unsupervised, or self-supervised learning. This is great since labels are expensive and limiting.
- Without labels there are many ways to learn representations. We saw today:
- representations as compressed codes, auto-encoder with bottleneck
- (representations that are shared across sensory modalities)
- (representations that are predictive of their context)
Thanks!
We'd love to hear your thoughts.
data:image/s3,"s3://crabby-images/4be09/4be09513d862c9c13fc93e921af59901dc75f08b" alt=""
6.390 IntroML (Fall24) - Lecture 7 Auto-encoders (Representation Learning)
By Shen Shen
6.390 IntroML (Fall24) - Lecture 7 Auto-encoders (Representation Learning)
- 101