Machine Learning in Intelligent Transportation
Session 4: Deep FeedForward Model & Backpropagation
Ahmad Haj Mosa
PwC Austria & Alpen Adria Universität Klagenfurt
Klagenfurt 2020
Deep Feedforward Networks
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron. Deep Learning (Adaptive Computation and Machine Learning series) (Page 163). The MIT Press..
Text
-
Also called feedfarward neural networks, or multilayer perceptrons (MLP)
-
The goal of feedfarward networks is to approximate some function \( y = f^*(x)\)
-
information flows through the function being evaluated from \(x\) , through the intermediate computations used to define \(f\) , and finally to the output \(y\) .
-
There are no feedback connections in which outputs of the model are fed back into itself.
Input layer
Hidden layers
Output layer
Deep Feedforward Model
Layer 1
Layer 2
Layer 3
Text
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
Deep Feedforward Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
Let \(z_i^{(L)}=\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)}) \) be the activation model
\( f(z_i^{(L)})= \frac {1}{ 1+ e^{-z_i^{(L)}}} \) is the sigmoid activation function
\( a_i^{(L)}=f(z_i^{(L)})=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \) is the output model
- \( y\) is the ground truth output (ie \( a_{1}^{(2)}\))
- \(h_{W}(x)\) is the final output of the network
- \(J(W)=\parallel h_{W}(x) - y \parallel^2 \) is the cost function
Chain Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(2)}a_j^{(L-1)})) \)
- The Gradient Descent is given by
function of
function of
function of
Chain Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
Chain Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
Chain Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
where
Chain Model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
If \(L\) is the output layer
Chain Model
If \(L\) is a hidden layer
Chain Models
If we denote \( \frac{\partial J}{\partial a_{i}^{(L)}}\) as \( \delta_i^{(L)}\)
Backpropagation formulas
If \(L\) is an output layer
If \(L\) is a hidden layer
The Backpropagation Algorithm
- Repeat
- Perform a feedforward pass, computing the activations for all layers
- For the output layer , set:
- For the hidden layer
- Update the weights
- If the target is achieved (minimum cost or maximum iterations) stop training
The Backpropagation Code
import numpy as np
# define the sigmoid function
def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
else:
return 1 / (1 + np.exp(-x))
# choose a random seed for reproducible results
np.random.seed(1)
# learning rate
alpha = .1
# number of nodes in the hidden layer
num_hidden = 3
# inputs
X = np.array([
[0, 0, 1],
[0, 1, 1],
[1, 0, 0],
[1, 1, 0],
[1, 0, 1],
[1, 1, 1],
])
# outputs
# x.T is the transpose of x, making this a column vector
y = np.array([[0, 1, 0, 1, 1, 0]]).T
# initialize weights randomly with mean 0 and range [-1, 1]
# the +1 in the 1st dimension of the weight matrices is for the bias weight
hidden_weights = 2*np.random.random((X.shape[1] + 1, num_hidden)) - 1
output_weights = 2*np.random.random((num_hidden + 1, y.shape[1])) - 1
# number of iterations of gradient descent
num_iterations = 10000
# for each iteration of gradient descent
for i in range(num_iterations):
# forward phase
# np.hstack((np.ones(...), X) adds a fixed input of 1 for the bias weight
input_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), X))
hidden_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), sigmoid(np.dot(input_layer_outputs, hidden_weights))))
output_layer_outputs = np.dot(hidden_layer_outputs, output_weights)
# backward phase
# output layer error term
output_error = output_layer_outputs - y
# hidden layer error term
# [:, 1:] removes the bias term from the backpropagation
hidden_error = hidden_layer_outputs[:, 1:] * (1 - hidden_layer_outputs[:, 1:]) * np.dot(output_error, output_weights.T[:, 1:])
# partial derivatives
hidden_pd = input_layer_outputs[:, :, np.newaxis] * hidden_error[: , np.newaxis, :]
output_pd = hidden_layer_outputs[:, :, np.newaxis] * output_error[:, np.newaxis, :]
# average for total gradients
total_hidden_gradient = np.average(hidden_pd, axis=0)
total_output_gradient = np.average(output_pd, axis=0)
# update weights
hidden_weights += - alpha * total_hidden_gradient
output_weights += - alpha * total_output_gradient
# print the final outputs of the neural network on the inputs X
print("Output After Training: \n{}".format(output_layer_outputs))Session 4: Deep Feedforward Networks
By ahmadadiga
Session 4: Deep Feedforward Networks
- 192