dr.federica bianco | fbb.space |    fedhere |    fedhere

Neural Networks: CNNs

foundations of data science for everyone XII

this slide deck:

http://slides.com/federicabianco/fdsfe_12

proper care of your DNN

0

Advanced issue found

▲

0

NN are a vast topics and we only have 2 weeks!

Some FREE references!

michael nielsen

better pedagogical approach, more basic, more clear

ian goodfellow

mathematical approach, more advanced, unfinished

http://neuralnetworksanddeeplearning.com/index.html

michael nielsen

better pedagogical approach, more basic, more clear

https://www.deeplearningbook.org/

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

architecture - wide networks tend to overfit, deep networks are hard to train
number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices (like in all gradient descent)
loss function - needs to be appropriate to the task, e.g. classification vs regression
activation functions - needs to be consistent with the loss function
optimization scheme - needs to be appropriate to the task and data
learning rate in optimization - balance speed and accuracy
batch size - smaller batch size is faster but leads to overtraining

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

accuracy comparison

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

accuracy comparison

An article that compars various DNNs

https://arxiv.org/pdf/1605.07678.pdf

batch size

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

architecture - wide networks tend to overfit, deep networks are hard to train
number of epochs - the sweet spot is when learning slows down, but before you start overfitting... it may take DAYS! jumps may indicate bad initial choices
loss function - needs to be appropriate to the task, e.g. classification vs regression
activation functions - needs to be consistent with the loss function
optimization scheme - needs to be appropriate to the task and data
learning rate in optimization - balance speed and accuracy
batch size - smaller batch size is faster but leads to overtraining

5

Advanced issues found

▲

1

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Lots of parameters and lots of hyperparameters! What to choose?

cheatsheet

always check your loss function! it should go down smoothly and flatten out at the end of the training.

not flat? you are still learning!

too flat? you are overfitting...

loss (gallery of horrors)

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

jumps are not unlikely (and not necessarily a problem) if your activations are discontinuous (e.g. relu)

when you use validation you are introducing regularizations (e.g. dropout) so the loss can be smaller than for the training set

loss and learning rate (not that the appropriate learning rate depends on the chosen optimization scheme!)

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

loss	good for	activation last layer	size last layer
mean_squared_error	regression	linear	one node
mean_absolute_error	regression	linear	one node
mean_squared_logarithmit_error	regression	linear	one node
binary_crossentropy	binary classification	sigmoid	one node
categorical_crossentropy	multiclass classification	sigmoid	N nodes
Kullback_Divergence	multiclass classification, probabilistic inerpretation	sigmoid	N nodes

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

On the interpretability of DNNs

https://distill.pub/2020/circuits/zoom-in/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

CNN

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

Convolution

convolution is a mathematical operator on two functions

f and g

that produces a third function

f x g

expressing how the shape of one is modified by the other.

o

two images.

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1

-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

-1	-1	1
-1	1	-1
1	-1	-1

feature maps

1

convolution

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*-1)+(1*1)+(-1*-1)\\ (-1*-1)+(-1*-1)+(1*1)\\ = 7

7

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*1)+(-1*1)+(-1*1)\\ (-1*-1)+(-1*1)+(-1*1)\\ = -3

7	-3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3	5	-3
3	-1	7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3	5	-3
3	-1	7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

and it is reminiscent of the original layer

7

5

7

Convolve with different feature: each neuron is 1 feature

7	-3	3
-3	5	-3
3	-1	7

7

5

7

ReLu: normalization that replaces negative values with 0's

7	0	3
0	5	0
3	0	7

7

5

7

1c

Max-Pool

CNN

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
0	0	7

7

5

7

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5	7

MaxPooling: reduce image size & generalizes result

By reducing the size and picking the maximum of a sub-region we make the network less sensitive to specific details

convolutional NN

Punch Line

Deep Neural Net are not some fancy-pants methods, they are just linear models with a bunch of parameters

Black Box?

Because they have many parameters they are difficult to "interpret" (no easy feature extraction)

tha is ok becayse they are prediction machines

deep dreams

what is happening in DeepDream?

Deep Dream (DD) is a google software, a pre-trained NN (originally created on the Cafe architecture, now imported on many other platforms including tensorflow).

The high level idea relies on training a convolutional NN to recognize common objects, e.g. dogs, cats, cars, in images. As the network learns to recognize those objects is developes its layers to pick out "features" of the NN, like lines at a cetrain orientations, circles, etc.

The DD software runs this NN on an image you give it, and it loops on some layers, thus "manifesting" the things it knows how to recognize in the image.

Olague et al 2017

@akumadog

Brain Programming and the Random Search in Object Categorization

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

resources

Neural Network and Deep Learning

an excellent and free book on NN and DL

http://neuralnetworksanddeeplearning.com/index.html

Deep Learning An MIT Press book in preparation

Ian Goodfellow, Yoshua Bengio and Aaron Courville

https://www.deeplearningbook.org/lecture_slides.html

History of NN

https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html

resources

Gradient Descent

https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html