PERCEPTRONS

Let's say we'd like to have a single neuron learn a function

y

X_1

X_1

X_2

X_2

X1	X2	y
0	0	0
0	1	1
1	0	1
1	1	1

w_2

w_2

w_1

w_1

b

b

Observations

PERCEPTRONS

How do we make a prediction for each observations?

y

X_1

X_1

X_2

X_2

X1	X2	y
0	0	0
0	1	1
1	0	1
1	1	1

w_2

w_2

w_1

w_1

b

b

Assume we have the following values

w1	w2	b
1	-1	-0.5

Observations

Predictions

For the first observation:

Assume we have the following values

w1	w2	b
1	-1	-0.5

X_1 = 0, X_2 = 0, y =0

X_1 = 0, X_2 = 0, y =0

Predictions

For the first observation:

Assume we have the following values

w1	w2	b
1	-1	-0.5

X_1 = 0, X_2 = 0, y =0

X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b

h = w_1*X_1 + w_2*X_2 + b

h = 1*0 + -1*0 + -0.5 = -0.5

h = 1*0 + -1*0 + -0.5 = -0.5

h = -0.5

h = -0.5

Predictions

For the first observation:

Assume we have the following values

w1	w2	b
1	-1	-0.5

X_1 = 0, X_2 = 0, y =0

X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b

h = w_1*X_1 + w_2*X_2 + b

h = 1*0 + -1*0 + -0.5

h = 1*0 + -1*0 + -0.5

h = -0.5

h = -0.5

Transform to probability:

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-0.5)}

p = \frac{1}{1+\exp(-0.5)}

p = 0.38

p = 0.38

Predictions

For the first observation:

Assume we have the following values

w1	w2	b
1	-1	-0.5

X_1 = 0, X_2 = 0, y =0

X_1 = 0, X_2 = 0, y =0

First compute the weighted sum:

h = w_1*X_1 + w_2*X_2 + b

h = w_1*X_1 + w_2*X_2 + b

h = 1*0 + -1*0 + -0.5

h = 1*0 + -1*0 + -0.5

h = -0.5

h = -0.5

Transform to probability:

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-0.5)}

p = \frac{1}{1+\exp(-0.5)}

p = 0.38

p = 0.38

Round to get prediction:

\hat{y} = round(p)

\hat{y} = round(p)

\hat{y} = 0

\hat{y} = 0

Predictions

Putting it all together:

h = w_1*X_1 + w_2*X_2 + b

h = w_1*X_1 + w_2*X_2 + b

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-h)}

\hat{y} = round(p)

\hat{y} = round(p)

Assume we have the following values

w1	w2	b
1	-1	-0.5

X1	X2	y	h	p
0	0	0	-0.5	0.38	0
0	1	1
1	0	1
1	1	1

\hat{y}

\hat{y}

Fill out this table

Predictions

Putting it all together:

h = w_1*X_1 + w_2*X_2 + b

h = w_1*X_1 + w_2*X_2 + b

p = \frac{1}{1+\exp(-h)}

p = \frac{1}{1+\exp(-h)}

\hat{y} = round(p)

\hat{y} = round(p)

Assume we have the following values

w1	w2	b
1	-1	-0.5

X1	X2	y	h	p
0	0	0	-0.5	0.38	0
0	1	1	-1.5	0.18	0
1	0	1	0.5	0.62	1
1	1	1	-0.5	0.38	0

\hat{y}

\hat{y}

Fill out this table

Room for Improvement

Our neural net isn't so great... how do we make it better?

What do I even mean by better?

Room for Improvement

Let's define how we want to measure the network's performance.

There are many ways, but let's use squared-error:

(y - p)^2

(y - p)^2

Room for Improvement

Let's define how we want to measure the network's performance.

There are many ways, but let's use squared-error:

Now we need to find values for that make this error as small as possible

(y - p)^2

(y - p)^2

w_1, w_2, b

w_1, w_2, b

ALL OF ML IN ONE SLIDE

Our task is learning values for such the the difference between the predicted and actual values is as small as possible.

w_1, w_2, b

w_1, w_2, b

Learning from Data

So, how we find the "best" values for

w_1, w_2, b

w_1, w_2, b

Learning from Data

Recall (without PTSD) that the derivative of a function tells you how it is changing at any given location.

If the derivative is positive, it means it's going up.

If the derivative is negative, it means it's going down.

Learning from Data

Simple strategy:

- Start with initial values for

- Take partial derivatives of loss function

with respect to

- Subtract the derivative (also called the gradient) from each

w_1, w_2, b

w_1, w_2, b

w_1, w_2, b

w_1, w_2, b

Learning from Data

Simple strategy:

- Start with initial values for

- Take partial derivatives of loss function

with respect to

- Subtract the derivative (also called the gradient) from each

w_1, w_2, b

w_1, w_2, b

w_1, w_2, b

w_1, w_2, b

To the whiteboard!

THE BACKPROPAGATION ALGORITHM

Learning Rules for each Parameter

gw_1 = (p - y)*(p*(1-p)*X_1)

gw_1 = (p - y)*(p*(1-p)*X_1)

gw_2 = (p - y)*(p*(1-p)*X_2)

gw_2 = (p - y)*(p*(1-p)*X_2)

g_b = (p - y)*(p*(1-p))

g_b = (p - y)*(p*(1-p))

Gradient for

w^{new}_1 = w^{old}_1 - \sum gw_1

w^{new}_1 = w^{old}_1 - \sum gw_1

Update for

w^{new}_2 = w^{old}_2 - \sum gw_2

w^{new}_2 = w^{old}_2 - \sum gw_2

b^{new} = b^{old} - \sum g_b

b^{new} = b^{old} - \sum g_b

w_1

w_1

w_1

w_1

w_2

w_2

w_2

w_2

b

b

b

b

PERCEPTRONS BY HAND