Intro to Machine Learning
data:image/s3,"s3://crabby-images/176c7/176c746469f89d0898597fd5e5b6ad2ebe8ef0ca" alt=""
data:image/s3,"s3://crabby-images/651e3/651e3e8b658a1bfd61c989e8335ec0810203d560" alt=""
Lecture 7: Convolutional Neural Networks
Shen Shen
March 22, 2024
(videos edited from 3b1b; some slides adapted from Phillip Isola and Kaiming He)
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/5cf2c/5cf2c093f72ceb0755808d5dd77f0238afe78155" alt=""
data:image/s3,"s3://crabby-images/518f8/518f80039c1f54b903a1184f45510bdea1ab1e8b" alt=""
data:image/s3,"s3://crabby-images/518f8/518f80039c1f54b903a1184f45510bdea1ab1e8b" alt=""
data:image/s3,"s3://crabby-images/45c2f/45c2fa16b08c3464e3bf5dbb53fe2d13b323e8ba" alt=""
data:image/s3,"s3://crabby-images/a97f6/a97f60a8aee6b4bf91be1e2ca33235cbf870c32f" alt=""
data:image/s3,"s3://crabby-images/a97f6/a97f60a8aee6b4bf91be1e2ca33235cbf870c32f" alt=""
data:image/s3,"s3://crabby-images/45c2f/45c2fa16b08c3464e3bf5dbb53fe2d13b323e8ba" alt=""
data:image/s3,"s3://crabby-images/3eec9/3eec987eb12dfb7fcdecc9a8825ad2123f825952" alt=""
learnable weights
layer
linear combo
activations
input
A (feed-forward) neural network is
Recap: Backpropogation
data:image/s3,"s3://crabby-images/d7a1d/d7a1d4751a248de5c02022dc1b0fcbb63f0aa6f7" alt=""
data:image/s3,"s3://crabby-images/f4b07/f4b0793eaaec5f20402c42c15b5b2488a156392f" alt=""
data:image/s3,"s3://crabby-images/450c2/450c250c6a9606d3676c6cf094c84ed293878b57" alt=""
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
convolutional neural networks
-
Why do we need a special network for images?
-
Why is CNN (the) special network for images?
9
data:image/s3,"s3://crabby-images/2fd8c/2fd8c721a45b581952db689304f0f679220a415f" alt=""
Why do we
need a special net for images?
Q: Why do we need a specialized network?
data:image/s3,"s3://crabby-images/b9bae/b9baebbef1df11a5bddf11fb1d359e5ddab0d8f7" alt=""
👈 426-by-426 grayscale image
Use the same small 2-layer network?
need to learn ~3M parameters
Imagine even higher-resolution images, or more complex tasks...
A: fully-connected nets don't scale well to (interesting) images
data:image/s3,"s3://crabby-images/2fd8c/2fd8c721a45b581952db689304f0f679220a415f" alt=""
Why do we think
9?
is
data:image/s3,"s3://crabby-images/2fd8c/2fd8c721a45b581952db689304f0f679220a415f" alt=""
Why do we think any of
9?
is a
data:image/s3,"s3://crabby-images/0b1e8/0b1e8bb7b4947845a9caba8f68e5659f89236a13" alt=""
data:image/s3,"s3://crabby-images/4404d/4404d75438d1400b9ec5bd057b801016267b1c03" alt=""
data:image/s3,"s3://crabby-images/8a747/8a74726392ed167262af7d556b49d1533f4fd766" alt=""
- Visual hierarchy
data:image/s3,"s3://crabby-images/7b3dc/7b3dcb97faa3bc6e15d29d4930d40132f4d0538d" alt=""
layering would help take care of that
data:image/s3,"s3://crabby-images/4404d/4404d75438d1400b9ec5bd057b801016267b1c03" alt=""
data:image/s3,"s3://crabby-images/8a747/8a74726392ed167262af7d556b49d1533f4fd766" alt=""
- Visual hierarchy
- Spatial locality
- Translational invariance
data:image/s3,"s3://crabby-images/01df6/01df6de1e998dfe343cdffc602cabce742f34192" alt=""
data:image/s3,"s3://crabby-images/e0ba9/e0ba90272cd3921059c2a8d61ca70cda75cfb9ba" alt=""
data:image/s3,"s3://crabby-images/8a793/8a793a01338b6d4a9285cd5ee79c3a2017ec1a9e" alt=""
CNN cleverly exploits
to handle images efficiently
via
- layering (with nonlinear activations)
- convolution
- pooling
- Visual hierarchy
- Spatial locality
- Translational invariance
cleverly exploits
to handle efficiently
via
data:image/s3,"s3://crabby-images/1b2d3/1b2d3dc7d646162a95980ef34920239e4b687141" alt=""
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
Convolutional layer might sound foreign, but...
data:image/s3,"s3://crabby-images/3ec43/3ec43613c8bbf1bda7043572966e98b77a5f04ab" alt=""
data:image/s3,"s3://crabby-images/f9e75/f9e754d835696627085d9a1a8a2bdb8490318728" alt=""
0
1
0
1
1
-1
1
input image
filter
output image
1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
1
0
0
1
-1
1
1
-1
1
input image
filter
output image
1
-1
2
0
- 'look' locally
- parameter sharing
- "template" matching
- 'look' locally
data:image/s3,"s3://crabby-images/b8c1a/b8c1afdca25e421f02756205eadc6bbda7464d39" alt=""
data:image/s3,"s3://crabby-images/2d2d2/2d2d23a129cb3bc29138d7bfaa368ff8b352c36a" alt=""
0
1
-1
1
1
-1
1
input image
filter
output image
1
-1
2
0
fully-connected layer
- parameter sharing
data:image/s3,"s3://crabby-images/da705/da7054539559f1bbe13de9f52a8450823c8d67cf" alt=""
data:image/s3,"s3://crabby-images/7f9f8/7f9f890ed099064aa231e6a1ac6ce07293cb65ea" alt=""
0
1
0
1
1
-1
1
convolve
with
=
1
-1
1
0
0
0
1
-1
-1
0
0
0
0
0
1
0
0
0
-1
1
0
1
-1
0
0
or dot
with
- parameter sharing
data:image/s3,"s3://crabby-images/54415/5441581587df6e2b5d2bf4d0a530ecf28b47e26e" alt=""
0
1
0
1
1
0
1
0
1
1
convolve with ?
=
dot-product with ?
=
data:image/s3,"s3://crabby-images/54415/5441581587df6e2b5d2bf4d0a530ecf28b47e26e" alt=""
0
1
0
1
1
0
1
0
1
1
convolve with
dot-product with
1
data:image/s3,"s3://crabby-images/40113/40113c9fe3adba12038195067f798f4e4a25f5a5" alt=""
data:image/s3,"s3://crabby-images/72580/7258054d2a6212dc50adf14d15c460555ada0a17" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/999f6/999f603f0392f079d4c028c361ce37a401815abe" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/5f351/5f351daa939cd909ea09537243ff5922d6e7a440" alt=""
data:image/s3,"s3://crabby-images/151af/151af449dc6f37d7cdb7e71020b874b2ed133a76" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/7ff66/7ff66723793a6df5156e235f5879f49499d6d4d5" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/3f993/3f993dd7b6b45d1f5ac88f9afc40fcf986f5f6ce" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/c789e/c789e21636e1368b0a27f8ca8a396de67bcbcab7" alt=""
data:image/s3,"s3://crabby-images/7f444/7f444056cb2f643a8f4ce8eaa91201ebb412edd9" alt=""
data:image/s3,"s3://crabby-images/f7828/f7828cbbdf91b23a5a837c19d3d7c6143c06a02d" alt=""
data:image/s3,"s3://crabby-images/0a8de/0a8dee1cd11a873d79439cb077758948e87c41fe" alt=""
data:image/s3,"s3://crabby-images/30c5a/30c5aaf5992f9091f1f1657f06409fff17d99084" alt=""
data:image/s3,"s3://crabby-images/b4664/b46649ef9e362e326b5cdbe19e38ead58ca00d4b" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/d41e3/d41e3c4efd5c273db6acf0671bfa27c56b7702cb" alt=""
data:image/s3,"s3://crabby-images/dc07a/dc07ae5d9971b7ebc40cc8867a6f14cc9d21c54f" alt=""
data:image/s3,"s3://crabby-images/0f8d6/0f8d6d55ee1f6e06d76bcc49524fefe3b9e08161" alt=""
data:image/s3,"s3://crabby-images/b61e8/b61e8c4239f892e3f8f0b02282b1e40a3ba6bc28" alt=""
data:image/s3,"s3://crabby-images/063dc/063dc2734fe4f1908fce143ee967221d58157537" alt=""
data:image/s3,"s3://crabby-images/7dade/7dade3cc9c527c6c50afb0f391a1bb54bbfe9ab8" alt=""
data:image/s3,"s3://crabby-images/8eaac/8eaacbcbc43b9c830d772314dc696475facd293f" alt=""
data:image/s3,"s3://crabby-images/b5264/b5264176ad59d8dbde17ce7476c4cefedf54f77d" alt=""
data:image/s3,"s3://crabby-images/19a9e/19a9e3eba2fcdf334503999b762eeea734fc858f" alt=""
data:image/s3,"s3://crabby-images/4e854/4e8540b4714ec8e463cc9fefe6982bc7a7edf1c6" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/02301/023010ef3f778ae399531438f09d5f21780fcc68" alt=""
data:image/s3,"s3://crabby-images/5a2f0/5a2f00c1534d8491ed16cd0fc6fb8458d5adea58" alt=""
data:image/s3,"s3://crabby-images/5c2c4/5c2c45bfa2f67ab485ea81b26b474847a461b2bd" alt=""
data:image/s3,"s3://crabby-images/12db1/12db1f72002e1f84cdf8ac632cf3adea8535da05" alt=""
data:image/s3,"s3://crabby-images/a5957/a5957b4d7a2bc1b7d12de72c0332ed6752a4f55d" alt=""
data:image/s3,"s3://crabby-images/da637/da637eeef407cd2ec1558282b23f94dd2ad02642" alt=""
data:image/s3,"s3://crabby-images/cfa0f/cfa0f6c251e532f0a2eeb60e190cd090e154f5c6" alt=""
data:image/s3,"s3://crabby-images/d5752/d57523c898404234d836ee88545644e4da4bb4e0" alt=""
data:image/s3,"s3://crabby-images/3aade/3aade7ed427aab793a1d4d74399b1b3f029cdb12" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
data:image/s3,"s3://crabby-images/01de9/01de90949042c214c5ea7fff8cfee6d3f40ff895" alt=""
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
A tender intro to tensor:
data:image/s3,"s3://crabby-images/d1299/d1299a7e783e1f1e2772fa73cd5959e6b166cad6" alt=""
data:image/s3,"s3://crabby-images/6be11/6be1111ad9d3a137700c72bbfc23883448f6ec18" alt=""
data:image/s3,"s3://crabby-images/e05ee/e05eea35f73162266b2462e1f1b413c929873b5c" alt=""
[image credit: tensorflow]
data:image/s3,"s3://crabby-images/7dea3/7dea354af345093c354f445eb347a73d7a0fd3b5" alt=""
data:image/s3,"s3://crabby-images/ba26c/ba26c5cbfa47d81e5cca9b6f324b2f0e2e54354a" alt=""
data:image/s3,"s3://crabby-images/a2a8c/a2a8c7e8032e7c6fab14156e2b953098fdeea0bd" alt=""
data:image/s3,"s3://crabby-images/60a97/60a9719ce0bca3bfbb3293575307b5e33c245fc3" alt=""
data:image/s3,"s3://crabby-images/229ab/229ab9d0344a4e03dbcdc952ad9083e6c2a8ed27" alt=""
red
green
data:image/s3,"s3://crabby-images/945b2/945b2cecbe6b705a8f4c3e3e4071d81c1384cf6b" alt=""
blue
data:image/s3,"s3://crabby-images/e1833/e1833137d66638afc9dde4f2a995bee614168dbc" alt=""
data:image/s3,"s3://crabby-images/60a97/60a9719ce0bca3bfbb3293575307b5e33c245fc3" alt=""
data:image/s3,"s3://crabby-images/229ab/229ab9d0344a4e03dbcdc952ad9083e6c2a8ed27" alt=""
data:image/s3,"s3://crabby-images/945b2/945b2cecbe6b705a8f4c3e3e4071d81c1384cf6b" alt=""
data:image/s3,"s3://crabby-images/e1833/e1833137d66638afc9dde4f2a995bee614168dbc" alt=""
image channels
image width
image
height
image channels
image width
image
height
input tensor
filter
output
- 3d tensor input, depth d
- 3d tensor filter, depth d
- 2d tensor (matrix) output
input tensor
filters
outputs
input tensor
filters
output tensor
- 3d tensor input, depth d
- k 3d filters:
- each filter of depth d
- each filter makes a 2d tensor (matrix) output
- total output 3d tensor, depth k
data:image/s3,"s3://crabby-images/b22d8/b22d8810f8686a9942d48ab2a2b4e84873a80439" alt=""
[image credit: medium]
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
data:image/s3,"s3://crabby-images/68416/68416c24592f01ca56406abfdb4de36d6aecdb9c" alt=""
sliding window
(w. stride)
sliding window
(w. stride)
convolution
max pooling
data:image/s3,"s3://crabby-images/43ae6/43ae6fe9637ac95eff5494b44cb076d4b8929405" alt=""
data:image/s3,"s3://crabby-images/f43c4/f43c4cea87486e63de255fc86fa4aeefb8a88300" alt=""
data:image/s3,"s3://crabby-images/8a327/8a32736640910faeb54b7c569de5e7ff2e75515f" alt=""
Outline
- Recap (fully-connected net)
- Motivation and big picture ideas of CNN
- Convolution operation
- 1d and 2d convolution mechanics
- interpretation:
- local connectivity
- weight sharing
- 3d tensors
- Max pooling
- Larger window
- Typical architecture and summary
data:image/s3,"s3://crabby-images/1b2d3/1b2d3dc7d646162a95980ef34920239e4b687141" alt=""
data:image/s3,"s3://crabby-images/fdd29/fdd29e72b1f3824922334e4356f056d98576e4d2" alt=""
Thanks!
We'd love it for you to share some lecture feedback.
introml-sp24-lec7
By Shen Shen
introml-sp24-lec7
- 173