Introduction to Deep Learning

Contents

  • Neural network's architecture overview

  • Activation functions

  • Backpropagation

Neural network's architecture overview

Architecture overview

The most basic component of an artificial neural network is the activation unit.

 

It is made of an input, or set of n inputs (which may include a constant bias term) an 'activation' function and an output. 

X_1
X_2
\dots
X_n
\sim
\theta_1
\theta_2
\theta_n
X_0
\theta_0
O_i
\sum

Activation node

\sim
\sum

Multilayer network

When we stack this units together into layers, we get a multilayer artificial neural network

O_1
O_2
O_3
\sim
\sum
\sim
\sum
\sim
\sum
\sim
\sum
\sim
\sum
\sim
\sum
\sim
\sum
X_1
X_2
X_3

Learning rules

Classification example:

 

XOR function

 

Let us suppose that we want to create a two layer neural network able to classify these observations.

(0,1) \rightarrow 1
(1,1) \rightarrow 0
(1,0) \rightarrow 1
(0,0) \rightarrow 0

Learning rules

Classification example:

 

XOR function

 

Or equivalently, we want a neural network able to create a classification region such as the yellow one. 

Learning rules

Classification example:

 

XOR function

 

Proposed solution

\sim
\sum
-0.5
1
\sim
\sum
\sim
\sum
-1
+1
1
X_1
X_2
+1
+1
+1
+1
-0.5
-1.5

Learning rules

\sim
\sum
-0.5
1
\sim
\sum
\sim
\sum
-1
+1
1
X_1
X_2
+1
+1
+1
+1
-0.5
-1.5
0
1
heaviside(0*1 + 1*1 + (-1.5*1))=heaviside(-0.5)=0
heaviside(0*1 + 1*1 + (-0.5*1))=heaviside(0.5)=1
heaviside(-1*0 + 1*1 + (-0.5*1))=heaviside(0.5)=1

Learning rules

\sim
\sum
-0.5
1
\sim
\sum
\sim
\sum
-1
+1
1
X_1
X_2
+1
+1
+1
+1
-0.5
-1.5
0
0
heaviside(0*1 + 1*0 + (-1.5*1))=heaviside(-1.5)=0
heaviside(0*1 + 1*0 + (-0.5*1))=heaviside(-0.5)=0
heaviside(-1*0 + 1*0 + (-0.5*1))=heaviside(-0.5)=0

Activation functions

Activation function

More complex activation functions

Activation function

Backpropagation

Backpropagation

O_1
O_2
O_3

Now our objective is to train our network with a gradient based method, and to somehow propagate the errors to the previous layers 

Backpropagation

O_1
O_2
O_3

Of course, with more complex architectures, the problem of computing gradients becomes an issue

Backpropagation

O_1
O_2
O_3
Made with Slides.com