Introduction to Deep Learning

Neural network's architecture overview

Architecture overview

The most basic component of an artificial neural network is the activation unit.

It is made of an input, or set of n inputs (which may include a constant bias term) an 'activation' function and an output.

X_1

X_2

\dots

X_n

\sim

\theta_1

\theta_2

\theta_n

X_0

\theta_0

O_i

\sum

Activation node

\sim

\sum

Multilayer network

When we stack this units together into layers, we get a multilayer artificial neural network

O_1

O_2

O_3

\sim

\sum

\sim

\sum

\sim

\sum

\sim

\sum

\sim

\sum

\sim

\sum

\sim

\sum

X_1

X_2

X_3

Learning rules

Classification example:

XOR function

Let us suppose that we want to create a two layer neural network able to classify these observations.

(0,1) \rightarrow 1

(1,1) \rightarrow 0

(1,0) \rightarrow 1

(0,0) \rightarrow 0

Learning rules

Classification example:

XOR function

Or equivalently, we want a neural network able to create a classification region such as the yellow one.

Learning rules

Classification example:

XOR function

Proposed solution

\sim

\sum

-0.5

1

\sim

\sum

\sim

\sum

-1

+1

1

X_1

X_2

+1

-0.5

-1.5

Learning rules

\sim

\sum

-0.5

1

\sim

\sum

\sim

\sum

-1

+1

1

X_1

X_2

+1

-0.5

-1.5

0

1

heaviside(0*1 + 1*1 + (-1.5*1))=heaviside(-0.5)=0

heaviside(0*1 + 1*1 + (-0.5*1))=heaviside(0.5)=1

heaviside(-1*0 + 1*1 + (-0.5*1))=heaviside(0.5)=1

Learning rules

\sim

\sum

-0.5

1

\sim

\sum

\sim

\sum

-1

+1

1

X_1

X_2

+1

-0.5

-1.5

0

heaviside(0*1 + 1*0 + (-1.5*1))=heaviside(-1.5)=0

heaviside(0*1 + 1*0 + (-0.5*1))=heaviside(-0.5)=0

heaviside(-1*0 + 1*0 + (-0.5*1))=heaviside(-0.5)=0

Activation functions

Activation function

More complex activation functions

Activation function

Backpropagation

O_1

O_2

O_3

Now our objective is to train our network with a gradient based method, and to somehow propagate the errors to the previous layers

Backpropagation

O_1

O_2

O_3

Of course, with more complex architectures, the problem of computing gradients becomes an issue

Backpropagation

O_1

O_2

O_3

Introduction to Deep Learning

Contents

Neural network's architecture overview

Architecture overview

Activation node

Multilayer network

Learning rules

Learning rules

Learning rules

Learning rules

Learning rules

Activation functions

Activation function

Activation function

Backpropagation

Backpropagation

Backpropagation

Backpropagation