PERCEPTRONS BY HAND

Andrew Beam, PhD

Department of Epidemiology

HSPH

 

twitter: @AndrewLBeam

PERCEPTRONS

Let's say we'd like to have a single neuron learn a function

y

X_1
X_2
X1 X2 y
0 0 0
0 1 1
1 0 1
1 1 1
w_2
w_1
b

Observations

PERCEPTRONS

How do we make a prediction for each observations?

y

X_1
X_2
X1 X2 y
0 0 0
0 1 1
1 0 1
1 1 1
w_2
w_1
b

Assume we have the following values

w1 w2 b
1 -1 -0.5

Observations

Predictions

For the first observation:

Assume we have the following values

w1 w2 b
1 -1 -0.5
X_1 = 0, X_2 = 0, y =0

Predictions

For the first observation:

Assume we have the following values

w1 w2 b
1 -1 -0.5
X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b
h = 1*0 + -1*0 + -0.5 = -0.5
h = -0.5

Predictions

For the first observation:

Assume we have the following values

w1 w2 b
1 -1 -0.5
X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b
h = 1*0 + -1*0 + -0.5
h = -0.5

Transform to probability:

p = \frac{1}{1+\exp(-h)}
p = \frac{1}{1+\exp(-0.5)}
p = 0.38

Predictions

For the first observation:

Assume we have the following values

w1 w2 b
1 -1 -0.5
X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b
h = 1*0 + -1*0 + -0.5
h = -0.5

Transform to probability:

p = \frac{1}{1+\exp(-h)}
p = \frac{1}{1+\exp(-0.5)}
p = 0.38

Round to get prediction:

\hat{y} = round(p)
\hat{y} = 0

Predictions

Putting it all together:

h = w_1*X_1 + w_2*X_2 + b
p = \frac{1}{1+\exp(-h)}
\hat{y} = round(p)

Assume we have the following values

w1 w2 b
1 -1 -0.5
X1 X2 y h p
0 0 0 -0.5 0.38 0
0 1 1
1 0 1
1 1 1
\hat{y}

Fill out this table

Predictions

Putting it all together:

h = w_1*X_1 + w_2*X_2 + b
p = \frac{1}{1+\exp(-h)}
\hat{y} = round(p)

Assume we have the following values

w1 w2 b
1 -1 -0.5
X1 X2 y h p
0 0 0 -0.5 0.38 0
0 1 1 -1.5 0.18 0
1 0 1 0.5 0.62 1
1 1 1 -0.5 0.38 0
\hat{y}

Fill out this table

Room for Improvement

Our neural net isn't so great... how do we make it better?

 

 

What do I even mean by better?

Room for Improvement

Let's define how we want to measure the network's performance.

 

There are many ways, but let's use squared-error:

 

 

 

(y - p)^2

Room for Improvement

Let's define how we want to measure the network's performance.

 

There are many ways, but let's use squared-error:

 

 

 

Now we need to find values for                  that make this error as small as possible

(y - p)^2
w_1, w_2, b

ALL OF ML IN ONE SLIDE

Our task is learning values for                 such the the difference between the predicted and actual values is as small as possible.

w_1, w_2, b

Learning from Data

So, how we find the "best" values for 

w_1, w_2, b

Learning from Data

Recall (without PTSD) that the derivative of a function tells you how it is changing at any given location.

 

If the derivative is positive, it means it's going up.

 

If the derivative is negative, it means it's going down. 

 

 

Learning from Data

Simple strategy:

​​- Start with initial values for

- Take partial derivatives of loss function

with respect to

- Subtract the derivative (also called the gradient) from each

w_1, w_2, b
w_1, w_2, b

Learning from Data

Simple strategy:

​​- Start with initial values for

- Take partial derivatives of loss function

with respect to

- Subtract the derivative (also called the gradient) from each

w_1, w_2, b
w_1, w_2, b

To the whiteboard!

THE BACKPROPAGATION ALGORITHM

Learning Rules for each Parameter

gw_1 = (p - y)*(p*(1-p)*X_1)
gw_2 = (p - y)*(p*(1-p)*X_2)
g_b = (p - y)*(p*(1-p))

Gradient for 

Gradient for 

Gradient for 

w^{new}_1 = w^{old}_1 - \sum gw_1

Update for 

Update for 

Update for 

w^{new}_2 = w^{old}_2 - \sum gw_2
b^{new} = b^{old} - \sum g_b
w_1
w_1
w_2
w_2
b
b
Made with Slides.com