PwC Austria & Alpen Adria Universität Klagenfurt
Klagenfurt 2020
Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron. Deep Learning (Adaptive Computation and Machine Learning series) (Page 163). The MIT Press..
Text
Also called feedfarward neural networks, or multilayer perceptrons (MLP)
The goal of feedfarward networks is to approximate some function \( y = f^*(x)\)
information flows through the function being evaluated from \(x\) , through the intermediate computations used to define \(f\) , and finally to the output \(y\) .
There are no feedback connections in which outputs of the model are fed back into itself.
Input layer
Hidden layers
Output layer
Layer 1
Layer 2
Layer 3
Text
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
Let \(z_i^{(L)}=\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)}) \) be the activation model
\( f(z_i^{(L)})= \frac {1}{ 1+ e^{-z_i^{(L)}}} \) is the sigmoid activation function
\( a_i^{(L)}=f(z_i^{(L)})=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \) is the output model
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(2)}a_j^{(L-1)})) \)
function of
function of
function of
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
Text
where
The general neuron model is given by:
\(a_i^{(L)}=f(\sum_{j=1}^{r^{(L-1)}}(W_{ij}^{(L-1)}a_j^{(L-1)})) \)
function of
function of
function of
If \(L\) is the output layer
If \(L\) is a hidden layer
If we denote \( \frac{\partial J}{\partial a_{i}^{(L)}}\) as \( \delta_i^{(L)}\)
If \(L\) is an output layer
If \(L\) is a hidden layer
import numpy as np
# define the sigmoid function
def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
else:
return 1 / (1 + np.exp(-x))
# choose a random seed for reproducible results
np.random.seed(1)
# learning rate
alpha = .1
# number of nodes in the hidden layer
num_hidden = 3
# inputs
X = np.array([
[0, 0, 1],
[0, 1, 1],
[1, 0, 0],
[1, 1, 0],
[1, 0, 1],
[1, 1, 1],
])
# outputs
# x.T is the transpose of x, making this a column vector
y = np.array([[0, 1, 0, 1, 1, 0]]).T
# initialize weights randomly with mean 0 and range [-1, 1]
# the +1 in the 1st dimension of the weight matrices is for the bias weight
hidden_weights = 2*np.random.random((X.shape[1] + 1, num_hidden)) - 1
output_weights = 2*np.random.random((num_hidden + 1, y.shape[1])) - 1
# number of iterations of gradient descent
num_iterations = 10000
# for each iteration of gradient descent
for i in range(num_iterations):
# forward phase
# np.hstack((np.ones(...), X) adds a fixed input of 1 for the bias weight
input_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), X))
hidden_layer_outputs = np.hstack((np.ones((X.shape[0], 1)), sigmoid(np.dot(input_layer_outputs, hidden_weights))))
output_layer_outputs = np.dot(hidden_layer_outputs, output_weights)
# backward phase
# output layer error term
output_error = output_layer_outputs - y
# hidden layer error term
# [:, 1:] removes the bias term from the backpropagation
hidden_error = hidden_layer_outputs[:, 1:] * (1 - hidden_layer_outputs[:, 1:]) * np.dot(output_error, output_weights.T[:, 1:])
# partial derivatives
hidden_pd = input_layer_outputs[:, :, np.newaxis] * hidden_error[: , np.newaxis, :]
output_pd = hidden_layer_outputs[:, :, np.newaxis] * output_error[:, np.newaxis, :]
# average for total gradients
total_hidden_gradient = np.average(hidden_pd, axis=0)
total_output_gradient = np.average(output_pd, axis=0)
# update weights
hidden_weights += - alpha * total_hidden_gradient
output_weights += - alpha * total_output_gradient
# print the final outputs of the neural network on the inputs X
print("Output After Training: \n{}".format(output_layer_outputs))