PERCEPTRONS BY HAND
Andrew Beam, PhD
Department of Epidemiology
HSPH
twitter: @AndrewLBeam
PERCEPTRONS
Let's say we'd like to have a single neuron learn a function
y
X1 | X2 | y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Observations
PERCEPTRONS
How do we make a prediction for each observations?
y
X1 | X2 | y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
Observations
Predictions
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
Predictions
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
Predictions
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
Transform to probability:
Predictions
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
Transform to probability:
Round to get prediction:
Predictions
Putting it all together:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
X1 | X2 | y | h | p | |
---|---|---|---|---|---|
0 | 0 | 0 | -0.5 | 0.38 | 0 |
0 | 1 | 1 | |||
1 | 0 | 1 | |||
1 | 1 | 1 |
Fill out this table
Predictions
Putting it all together:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
X1 | X2 | y | h | p | |
---|---|---|---|---|---|
0 | 0 | 0 | -0.5 | 0.38 | 0 |
0 | 1 | 1 | -1.5 | 0.18 | 0 |
1 | 0 | 1 | 0.5 | 0.62 | 1 |
1 | 1 | 1 | -0.5 | 0.38 | 0 |
Fill out this table
Room for Improvement
Our neural net isn't so great... how do we make it better?
What do I even mean by better?
Room for Improvement
Let's define how we want to measure the network's performance.
There are many ways, but let's use squared-error:
Room for Improvement
Let's define how we want to measure the network's performance.
There are many ways, but let's use squared-error:
Now we need to find values for that make this error as small as possible
ALL OF ML IN ONE SLIDE
Our task is learning values for such the the difference between the predicted and actual values is as small as possible.
Learning from Data
So, how we find the "best" values for
Learning from Data
Recall (without PTSD) that the derivative of a function tells you how it is changing at any given location.
If the derivative is positive, it means it's going up.
If the derivative is negative, it means it's going down.
Learning from Data
Simple strategy:
- Start with initial values for
- Take partial derivatives of loss function
with respect to
- Subtract the derivative (also called the gradient) from each
Learning from Data
Simple strategy:
- Start with initial values for
- Take partial derivatives of loss function
with respect to
- Subtract the derivative (also called the gradient) from each
To the whiteboard!
THE BACKPROPAGATION ALGORITHM
Learning Rules for each Parameter
Gradient for
Gradient for
Gradient for
Update for
Update for
Update for
BMI 707 / EPI 290: Perceptrons by hand (2022)
By beamandrew
BMI 707 / EPI 290: Perceptrons by hand (2022)
- 777