Andrew Beam, PhD
Department of Epidemiology
HSPH
twitter: @AndrewLBeam
Let's say we'd like to have a single neuron learn a function
y
X1 | X2 | y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Observations
How do we make a prediction for each observations?
y
X1 | X2 | y |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
Observations
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
Transform to probability:
For the first observation:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
First compute the weighted sum:
Transform to probability:
Round to get prediction:
Putting it all together:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
X1 | X2 | y | h | p | |
---|---|---|---|---|---|
0 | 0 | 0 | -0.5 | 0.38 | 0 |
0 | 1 | 1 | |||
1 | 0 | 1 | |||
1 | 1 | 1 |
Fill out this table
Putting it all together:
Assume we have the following values
w1 | w2 | b |
---|---|---|
1 | -1 | -0.5 |
X1 | X2 | y | h | p | |
---|---|---|---|---|---|
0 | 0 | 0 | -0.5 | 0.38 | 0 |
0 | 1 | 1 | -1.5 | 0.18 | 0 |
1 | 0 | 1 | 0.5 | 0.62 | 1 |
1 | 1 | 1 | -0.5 | 0.38 | 0 |
Fill out this table
Our neural net isn't so great... how do we make it better?
What do I even mean by better?
Let's define how we want to measure the network's performance.
There are many ways, but let's use squared-error:
Let's define how we want to measure the network's performance.
There are many ways, but let's use squared-error:
Now we need to find values for that make this error as small as possible
Our task is learning values for such the the difference between the predicted and actual values is as small as possible.
So, how we find the "best" values for
Recall (without PTSD) that the derivative of a function tells you how it is changing at any given location.
If the derivative is positive, it means it's going up.
If the derivative is negative, it means it's going down.
Simple strategy:
- Start with initial values for
- Take partial derivatives of loss function
with respect to
- Subtract the derivative (also called the gradient) from each
Simple strategy:
- Start with initial values for
- Take partial derivatives of loss function
with respect to
- Subtract the derivative (also called the gradient) from each
To the whiteboard!
Gradient for
Gradient for
Gradient for
Update for
Update for
Update for