principles of Urban Science 11

dr.federica bianco | fbb.space |    fedhere |    fedhere

Neural Networks: CNNs+autoencoders

this slide deck:

http://slides.com/federicabianco/pus2022_11

SCHEDULE

28 - in person - CNN
30 - Remote - CNN - Auto encoders (Data Ethics)
5 - Remote - Visualizations
7 - In person - Data Ethics
11 - report first version due
12 - ? Q/A?
14 - Presentations
16 - final version of repor due

Recap

0

Data driven models for exploration of structure, prediction that learn parameters from data.

Machine Learning

Data driven models for exploration of structure, prediction that learn parameters from data.

unupervised ------ supervised

set up: All features known for all observations

Goal: explore structure in the data

- data compression

- understanding structure

Algorithms: Clustering, (...)

x

y

Machine Learning

Data driven models for exploration of structure, prediction that learn parameters from data.

unupervised ------ supervised

set up: All features known for a sunbset of the data; one feature cannot be observed for the rest of the data

Goal: predicting missing feature

- classification

- regression

Algorithms: regression, SVM, tree methods, k-nearest neighbors, neural networks, (...)

x

y

Machine Learning

unupervised ------ supervised

set up: All features known for a sunbset of the data; one feature cannot be observed for the rest of the data

Goal: predicting missing feature

- classification

- regression

Algorithms: regression, SVM, tree methods, k-nearest neighbors, neural networks, (...)

unupervised ------ supervised

set up: All features known for all observations

Goal: explore structure in the data

- data compression

- understanding structure

Algorithms: k-means clustering, agglomerative clustering, density based clustering, (...)

Machine Learning

model parameters are learned by calculating a loss function for diferent parameter sets and trying to minimize loss (or a target function and trying to maximize)

e.g.

L1 = |target - prediction|

Learning relies on the definition of a loss function

Machine Learning

Learning relies on the definition of a loss function

learning type	loss / target
unsupervised	intra-cluster variance / inter cluster distance
supervised	distance between prediction and truth

Machine Learning

The definition of a loss function requires the definition of distance or similarity

Machine Learning

Minkowski distance

Jaccard similarity

Great circle distance

B

{A\cap B}

A

The definition of a loss function requires the definition of distance or similarity

Machine Learning

NN:

Neural Networks

1

+b

f

w_2

w_1

w_N

output

sigmoid

f

\sigma = \frac{1}{1 + e^{-z}}

.

x_1

x_2

x_N

y ~= f(~\sum_i w_ix_i ~+~ b)

Perceptrons are linear classifiers:

makes predictions based on a linear predictor function

combining a set of weights (=parameters) with the feature vector.

f_j

weights

w_{ij}

bias

b_j

activation function

f

Turn a linear prediction into a binary or probabilistic classification

activation function

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

x1

x2

b1

b2

b3

b

w11

w12

w13

w21

0

Advanced issue found

▲

w22

w23

multilayer perceptron

w: weight

sets the sensitivity of a neuron

b: bias:

up-down weights a neuron

yes/no

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

x1

x2

b1

b2

b3

b

w11

w12

w13

w21

0

Advanced issue found

▲

w22

w23

multilayer perceptron

w: weight

sets the sensitivity of a neuron

b: bias:

up-down weights a neuron

yes/no

connected: all nodes go to all nodes of the next layer.

w_{11}x_1 + w_{21}x_2 + b1

activation function

w_{12}x_1 + w_{22}x_2 + b2

w_{13}x_1 + w_{23}x_2 + b3

EXERCISE

http://playground.tensorflow.org/

DNN:

Deep Neural Networks

2

layer connectivity

x_2

x_3

output

input layer

hidden layer

output layer

x_1

Fully connected: all nodes go to all nodes of the next layer.

b_1

b_2

b_3

b_4

b_1

b

x_2

x_3

output

input layer

hidden layer

output layer

x_1

Sparcely connected: all nodes go to all nodes of the next layer.

b_1

b_2

b_3

b_4

b_1

b

layer connectivity

x_2

x_3

output

input layer

hidden layer

output layer

x_1

Sparcely connected: all nodes go to all nodes of the next layer.

b_1

b_2

b_3

b_4

b_1

b

The last layer is always connected

layer connectivity

1x3

3x5

5x2

=

2x1

what we are doing is just a series of matrix multiplictions.

what we are doing is exactly a series of matrix multiplictions.

\phi(\vec{x}) ~\sim~f^{(3)}(f^{(2)}(f^{(1)}(\vec{x} \cdot W_1 + \vec{b_1}) \cdot W_2 + \vec{b_2}) \cdot W_3 + \vec{b_3})~=~y

DeepNeuralNetwork

The purpose is to approximate a function φ

y = φ(x)

which (in general) is not linear with linear operations

\phi(\vec{x}) ~\sim~f^{(3)}(f^{(2)}(f^{(1)}(\vec{x} \cdot W_1 + \vec{b_1}) \cdot W_2 + \vec{b_2}) \cdot W_3 + \vec{b_3})~=~y

DeepNeuralNetwork

The purpose is to approximate a function φ

y = φ(x)

which (in general) is not linear with linear operations

http://neuralnetworksanddeeplearning.com/chap4.html

CNN

Olague et al 2017

The visual cortex learns hierarchically: first detects simple features, then more complex features and ensembles of features

Convolution

convolution is a mathematical operator on two functions

f and g

that produces a third function

f x g

expressing how the shape of one is modified by the other.

o

two images.

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1

-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

-1	-1	1
-1	1	-1
1	-1	-1

feature maps

1

convolution

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*-1)+(1*1)+(-1*-1)\\ (-1*-1)+(-1*-1)+(1*1)\\ = 7

7

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

(-1*1) + (-1*-1) + (-1*-1) + \\ (-1*1)+(-1*1)+(-1*1)\\ (-1*-1)+(-1*1)+(-1*1)\\ = -3

7	-3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

1	-1	-1
-1	1	-1
-1	-1	1

7	-1	3
?	?

=

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3	5	-3
3	-1	7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

1

-1	-1	-1	-1	-1
-1		-1		-1
-1	-1		-1	-1
-1		-1		-1
-1	-1	-1	-1	-1

1	-1	-1
-1	1	-1
-1	-1	1

7	-3	3
-3	5	-3
3	-1	7

=

input layer

feature map

convolution layer

the feature map is "richer": we went from binary to R

and it is reminiscent of the original layer

7

5

7

Convolve with different feature: each neuron is 1 feature

7	-3	3
-3	5	-3
3	-1	7

7

5

7

ReLu: normalization that replaces negative values with 0's

7	0	3
0	5	0
3	0	7

7

5

7

1c

Max-Pool

CNN

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
0	0	7

7

5

7

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5

MaxPooling: reduce image size, generalizes result

7	0	3
0	5	0
3	0	7

7

5

7

2x2 Max Poll

7	5
5	7

MaxPooling: reduce image size & generalizes result

By reducing the size and picking the maximum of a sub-region we make the network less sensitive to specific details

convolutional NN

training DNN

3

https://colab.research.google.com/drive/13c9uJ_fPGjszgsyEuYWafR2F4_n-IXeZ

.

x_1

x_2

x_N

+b

\vec{y} = \vec{x}W + b

Any linear model:

w_2

w_1

w_N

y

y : prediction

ytrue : target

Error: e.g.

L(\theta)~=~|y - y_\mathrm{model}|^2

intercept

slope

L2

x

Find the best parameters by finding the minimum of the L2 hyperplane

at every step look around and choose the best direction

Gradient Descent

.

x_1

x_2

x_N

+b

\vec{y} = \vec{x}W + b

Any linear model:

w_2

w_1

w_N

y

y : prediction

ytrue : target

Error: e.g.

L(\theta)~=~|y - y_\mathrm{model}|^2

intercept

slope

L2

Find the best parameters by finding the minimum of the L2 hyperplane

at every step look around and choose the best direction

Gradient Descent

at every step look around and choose the best direction

Gradient Descent

Training models with this many parameters requires a lot of care:

. defining the metric

. optimization schemes

. training/validation/testing sets

But just like our simple linear regression case, the fact that small changes in the parameters leads to small changes in the output for the right activation functions.

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a feed-forward DNN

feed data forward through network and calculate cost metric

for each layer, calculate effect of small changes on next layer

\vec{y} = f^{(N)}(....(f^{(1)}(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training models with this many parameters requires a lot of care:

. defining the metric

. optimization schemes

. training/validation/testing sets

earlier layers learn more slowly

Training a feed-forward DNN

Loss functions: with NN you often encounter this loss function

Gradient Descent

L(\theta) = −E_{x,y \sim \hat{p}_\mathrm{data}} log~ p_{\mathrm{model}}(y | x)

negative loglikelihood or cross entropy

if

p_\mathrm{model}(y | x) = N(y;f(x;\theta), I)

L(\theta) =\frac{1}{2}E_{x,y∼\hat{p}_\mathrm{data}}||y − f(x; θ)||^2+ c \sim 2MSE

back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

x_{j}~=~\sum_i y_{i}w_{ji} ~~~~~~ y_j~=\frac{1}{1+e^{-x_j}}

.

x_1

x_N

f

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

+b

f

w_2

output

Training a feed-forward DNN

back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

Training a feed-forward DNN

- we want to get the gradient to use it in downhill optimization

back-propagation

backprop is a dynamic programming algorithm that calculates all gradients than looks them up

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

Training a feed-forward DNN

- we want to get the gradient to use it in downhill optimization

0

Advanced issue found

▲

- chain rule

0

Advanced issue found

▲

\frac{\partial C }{\partial{x}} = \frac{\partial C }{\partial{y}}\frac{\partial y }{\partial{x}}

\vec{y} = f^{(N)}(....(f^{(1)}(\vec{x}{ W_i + b_1}...W_N + b_N)))

a_1=\sigma(z_1)=\sigma(w_1a_0+b_1)\\ \Delta a_1=\frac{\partial\sigma(w_1a_0+b_1)}{\partial b_1} \Delta b_1 =\sigma'(z_1) \Delta b_1

Training a feed-forward DNN

This is the simplest deep NN: one neuron per layer

\vec{y} = f^{(N)}(....(f^{(1)}(\vec{x}{ W_i + b_1}...W_N + b_N)))

\frac{\partial C}{\partial w} , \frac{\partial C}{\partial b}

these are the changes on the last layer w respect to w and b

a_1=\sigma(z_1)=\sigma(w_1a_0+b_1)\\

Training a feed-forward DNN

\vec{y} = f^{(N)}(....(f^{(1)}(\vec{x}{ W_i + b_1}...W_N + b_N)))

\frac{\partial C}{\partial x} = \frac{\partial C}{\partial z4} \frac{\partial C}{\partial z3} \frac{\partial C}{\partial z2} \frac{\partial C}{\partial z1} \frac{\partial C}{\partial x} =

Training a feed-forward DNN

This is the simplest deep NN: one neuron per layer

x

z1

z2

z3

z4

= f'(f(f(f(z))))f'(f(f(z)))f'(f(z))f'(z) =

\vec{y} = f_N(....(f_1(\vec{x}{ W_i + b_1}...W_N + b_N)))

back-propagation

how does linear descent look when you have a whole network structure with hundreds of weights and biases to optimize??

think of applying just gradient to a function of a function of a function... use:

1) partial derivatives, 2) chain rule

http://neuralnetworksanddeeplearning.com/chap2.html

C=\frac{1}{2}|y−a^L|^2~=~\frac{1}{2}\sum_j(y_j−a^L_j)^2

define a cost function, e.g.

Training a DNN

\vec{y} = f^{(N)}(....(f^{(1)}(\vec{x}{ W_i + b_1}...W_N + b_N)))

Training a DNN

http://neuralnetworksanddeeplearning.com/chap2.html#backpropsummary

https://colab.research.google.com/drive/13c9uJ_fPGjszgsyEuYWafR2F4_n-IXeZ

build a DNN from scratch using numpy

Autoencoders

4

Unsupervised learning with

Neural Networks

What do NN do? approximate complex functions with series of linear functions

.... so if my layers are smaller what I have is a compact representation of the data

}

5dim representation

4dim

3dim

complex imput data

Unsupervised learning with

Neural Networks

What do NN do? approximate complex functions with series of linear functions

To do that they extract information from the data

Each layer of the DNN produces a representation of the data a "latent representation" .

.... so if my layers are smaller what I have is a compact representation of the data

}

5dim representation

4dim

3dim

complex imput data

Unsupervised learning with

Neural Networks

What do NN do? approximate complex functions with series of linear functions

To do that they extract information from the data

Each layer of the DNN produces a representation of the data a "latent representation" .

The dimensionality of that latent representation is determined by the size of the layer (and its connectivity, but we will ignore this bit for now)

.... so if my layers are smaller what I have is a compact representation of the data

}

5dim representation

4dim

3dim

complex imput data

Autoencoder Architecture

Feed Forward DNN:

the size of the input is 5,

the size of the last layer is 2

Autoencoder Architecture

replicat the same structure backwards

Autoencoder Architecture

input

\vec{x}

output

\vec{z} = \vec{x}

ask it to reproduce the input

if you have not lost informatoin in the compression you can reproduce the input closely!

q_\phi(z|x)

p_\theta(x|z)

the target of the Autoencoder is the data itself

Autoencoder Architecture

Encoder: outputs a lower dimensional representation z of the data x (similar to PCA, tSNE...)
Decoder: Learns how to reconstruct x given z: learns p(x|z)

Autoencoder Architecture

from keras.layers import Dense, Flatten, Reshape, Input, InputLayer
from keras.models import Sequential, Model

def build_autoencoder(image_shape, bn_size):
    # Encoder
    encoder = Sequential()
    encoder.add(InputLayer(img_shape))
    encoder.add(Flatten())
    encoder.add(Dense(bn_size))

    # Decoder
    decoder = Sequential()
    decoder.add(InputLayer((bn_size,)))
    decoder.add(Dense(np.prod(image_shape))) 
    decoder.add(Reshape(image_shape))

    return encoder, decoder

Autoencoder Architecture

https://link.springer.com/chapter/10.1007/978-981-13-6661-1_3

Building a DNN

with keras and tensorflow

Trivial to build, but the devil is in the details!

Building a DNN

with keras and tensorflow

Trivial to build, but the devil is in the details!

from keras.models import Sequential
#can upload pretrained models from keras.models
from keras.layers import Dense,  Conv2D, MaxPooling2D
#create model
model = Sequential()


#create the model architecture by adding model layers
model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
model.add(Dense(10, activation='relu'))
model.add(Dense(1))

#need to choose the loss function, metric, optimization scheme
model.compile(optimizer='adam', loss='mean_squared_error')

#need to learn what to look for - always plot the loss function!
model.fit(x_train, y_train, validation_data=(x_test, y_test),
                     epochs=20, batch_size=100, verbose=1)
#note that the model allows to give a validation test, 
#this is for a 3fold cross valiation: train-validate-test 
#predict
test_y_predictions = model.predict(validate_X)

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

encoder

This autoencoder model has a 64-neuron bottle neck. This means it will generate a compressed representation of the data out of that layer which is 16-dimensional (the original size is 784 pixels)

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

This autoencoder model has a 64-neuron bottle neck. This means it will generate a compressed representation of the data out of that layer which is 16-dimensional (the original size is 784 pixels)

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

decoder

This autoencoder model has a 64-neuron bottle neck. This means it will generate a compressed representation of the data out of that layer which is 16-dimensional (the original size is 784 pixels)

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

This autoencoder model has a 64-neuron bottle neck. This means it will generate a compressed representation of the data out of that layer which is 16-dimensional (the original size is 784 pixels)

bottle neck

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

This simple odel has 200000 parameters!

My original choice is to train it with "adadelta" with a mean squared loss function, all activation functions are relu, appropriate for a linear regression

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

Building a DNN

with keras and tensorflow

autoencoder for image recontstruction

What should I choose for the loss function and how does that relate to the activation functiom and optimization?

loss	good for	activation last layer	size last layer
mean_squared_error	regression	linear	one node
mean_absolute_error	regression	linear	one node
mean_squared_logarithmit_error	regression	linear	one node
binary_crossentropy	binary classification	sigmoid	one node
categorical_crossentropy	multiclass classification	sigmoid	N nodes
Kullback_Divergence	multiclass classification, probabilistic inerpretation	sigmoid	N nodes

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

autoencoder for image recontstruction

model_digits64.add(Dense(ndim, 
                        activation='linear'))
model_digits64_sig.compile(optimizer="adadelta", 
                   loss="mean_squared_error")

model_digits64_sig.add(Dense(ndim, 
                             activation='sigmoid'))
model_digits64_sig.compile(optimizer="adadelta", 
                           loss="mean_squared_error")

model_digits64_sig.add(Dense(ndim, 
                             activation='sigmoid'))
model_digits64_bce.compile(optimizer="adadelta", 
                           loss="binary_crossentropy")

loss function: did not finish learning, it is still decreasing rapidly

The predictions are far too detailed. While the input is not binary, it does not have a lot of details. Maybe approaching it as a binary problem (with a sigmoid and a binary cross entropy loss) will give better results

loss function: also did not finish learning, it is still decreasing rapidly

A sigmoid gives activation gives a much better result!

Binary cross entropy loss function: It is more appriopriate when the output layer is sigmoid

Even better results!

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

original

predicted

original

predicted

original

predicted

autoencoder for image recontstruction

A more ambitious model has a 16 neurons bottle neck: we are trying to extract 16 numbers to reconstruct the entire image! its pretty remarcable! those 16 number are extracted features from the data

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

predicted

original

latent

representation

autoencoder for image recontstruction

https://github.com/fedhere/MLTSA_FBianco/blob/master/autoencode_digits.ipynb

The bias is in the data

The bias is in the models and the decision we make

The bias is in how we choose to optimize our model

Should AI reflect

who we are

(and enforce and grow our bias)

or should it reflect who we aspire to be?

(and who decides what that is?)

models are neutral, the bias is in the data

The bias is society that provides the framework to validate our biased models

models are neutral, the bias is in the data

The bias is in the data

The bias is in the models and the decision we make

The bias is in how we choose to optimize our model

The bias is society that provides the framework to validate our biased models

none of this is new

https://www.nytimes.com/2019/04/25/lens/sarah-lewis-racial-bias-photography.html

resources

Neural Network and Deep Learning

an excellent and free book on NN and DL

http://neuralnetworksanddeeplearning.com/index.html

Deep Learning An MIT Press book in preparation

Ian Goodfellow, Yoshua Bengio and Aaron Courville

https://www.deeplearningbook.org/lecture_slides.html

History of NN

https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history2.html

resources

Gradient Descent

https://ml-cheatsheet.readthedocs.io/en/latest/gradient_descent.html

principles of Urban Science 11

SCHEDULE

Recap

NN:

multilayer perceptron

multilayer perceptron

EXERCISE

DNN:

layer connectivity

layer connectivity

layer connectivity

DeepNeuralNetwork

DeepNeuralNetwork

DeepNeuralNetwork

DeepNeuralNetwork

DeepNeuralNetwork

DeepNeuralNetwork

DeepNeuralNetwork

CNN

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

Brain Programming and the Random Search in Object Categorization

1c

CNN

convolutional NN

Gradient Descent

Gradient Descent

Gradient Descent

Gradient Descent

Gradient Descent

back-propagation

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf

back-propagation

https://colab.research.google.com/drive/13c9uJ_fPGjszgsyEuYWafR2F4_n-IXeZ

models are neutral, the bias is in the data

models are neutral, the bias is in the data

resources

resources

Principles of Urban Science XI

More from federica bianco