Shen Shen
March 22, 2024
(videos edited from 3b1b; some slides adapted from Phillip Isola and Kaiming He)
learnable weights
layer
linear combo
activations
input
Q: Why do we need a specialized network?
👈 426-by-426 grayscale image
Use the same small 2-layer network?
need to learn ~3M parameters
Imagine even higher-resolution images, or more complex tasks...
A: fully-connected nets don't scale well to (interesting) images
layering would help take care of that
CNN cleverly exploits
to handle images efficiently
via
cleverly exploits
to handle efficiently
via
Convolutional layer might sound foreign, but...
0
1
0
1
1
-1
1
input image
filter
output image
1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
1
0
1
0
1
1
-1
1
input image
filter
output image
1
-1
1
0
0
1
-1
1
1
-1
1
input image
filter
output image
1
-1
2
0
0
1
-1
1
1
-1
1
input image
filter
output image
1
-1
2
0
fully-connected layer
0
1
0
1
1
-1
1
convolve
with
=
1
-1
1
0
0
0
1
-1
-1
0
0
0
0
0
1
0
0
0
-1
1
0
1
-1
0
0
or dot
with
0
1
0
1
1
0
1
0
1
1
convolve with ?
=
dot-product with ?
=
0
1
0
1
1
0
1
0
1
1
convolve with
dot-product with
1
A tender intro to tensor:
[image credit: tensorflow]
red
green
blue
image channels
image width
image
height
image channels
image width
image
height
input tensor
filter
output
input tensor
filters
outputs
input tensor
filters
output tensor
[image credit: medium]
sliding window
(w. stride)
sliding window
(w. stride)
convolution
max pooling
We'd love it for you to share some lecture feedback.