CS 106A
Stanford University
Chris Gregg
Today's topics:
Introduction to Artificial Intelligence
Introduction to Artificial Neural Networks
Examples of some basic neural networks
Using Python for Artificial Intelligence
Example: PyTorch
1950: Alan Turing: Turing Test
1951: First AI program
1965: Eliza (first chat bot)
1974: First autonomous vehicle
1997: Deep Blue beats Gary Kasimov at Chess
2004: First Autonomous Vehicle challenge
2011: IBM Watson beats Jeopardy winners
2016: Deep Mind beats Go champion
2017: AlphaGo Zero beats Deep Mind
NNs learn relationship between cause and effect or organize large volumes of data into orderly and informative patterns.
Slides modified from PPT by Mohammed Shbier
The processing element receives many signals.
Signals may be modified by a weight at the receiving synapse.
The processing element sums the weighted inputs.
Under appropriate circumstances (sufficient input), the neuron transmits a single output.
The output from a particular neuron may go to many other neurons.
A neural net consists of a large number of simple processing elements called neurons, units, cells or nodes.
Each neuron is connected to other neurons by means of directed communication links, each with associated weight.
The weight represent information being used by the net to solve a problem.
Each neuron has an internal state, called its activation or activity level, which is a function of the inputs it has received. Typically, a neuron sends its activation as a signal to several other neurons.
It is important to note that a neuron can send only one signal at a time, although that signal is broadcast to several other neurons.
Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process
In a biological system, learning involves adjustments to the synaptic connections between neurons
This is the same for artificial neural networks (ANNs)!
Let's model a slightly more complicated neural network:
If we touch something cold we perceive heat
If we keep touching something cold we will perceive cold
If we touch something hot we will perceive heat
If cold is applied for one time step then heat will be perceived
If a cold stimulus is applied for two time steps then cold will be perceived
If heat is applied at a time step, then we should perceive heat
At t(0), we apply a stimulus to X1 and X2
At t(1) we can update Z1, Z2 and Y1
At t(2) we can perceive a stimulus at Y2
At t(2+n) the network is fully functional
Y2(t) = X2(t – 2) AND X2(t – 1)
X2(t – 2) | X2( t – 1) | Y2(t) |
---|---|---|
1 | 1 | 1 |
1 | 0 | 0 |
0 | 1 | 0 |
0 | 0 | 0 |
Y1(t) = [ X1(t – 1) ] OR [ X2(t – 3) AND NOT X2(t – 2) ]
X2(t – 3) | X2(t – 2) | AND NOT | X1(t – 1) | OR |
---|---|---|---|---|
1 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | 1 | 1 |
0 | 1 | 0 | 1 | 1 |
0 | 0 | 0 | 1 | 1 |
1 | 1 | 0 | 0 | 0 |
1 | 0 | 1 | 0 | 1 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 0 | 0 | 0 |
The network shows
Y1(t) = X1(t – 1) OR Z1(t – 1)
Z1(t – 1) = Z2( t – 2) AND NOT X2(t – 2)
Z2(t – 2) = X2(t – 3)
Substituting, we get
Y1(t) = [ X1(t – 1) ] OR [ X2(t – 3) AND NOT X2(t – 2) ]
which is the same as our original requirements
This is great...but how do you build a network that learns?
We have to use input to predict output
We can do this using a mathematical algorithm called backpropagation, which measures statistics from input values and output values.
Backpropagation uses a training set
We are going to use the following training set:
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Can you figure out what the question mark should be?
This is great...but how do you build a network that learns?
We have to use input to predict output
We can do this using a mathematical algorithm called backpropogation, which measures statistics from input values and output values.
Backpropogation uses a training set
We are going to use the following training set:
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Can you figure out what the question mark should be?
The output is always equal to the value of the leftmost input column. Therefore the answer is the ‘?’ should be 1.
We start by giving each input a weight, which will be a positive or negative number.
Large numbers (positive or negative) will have a large effect on the neuron's output.
We start by setting each weight to a random number, and then we train:
Take the inputs from a training set example, adjust them by the weights, and pass them through a special formula to calculate the neuron’s output.
Calculate the error, which is the difference between the neuron’s output and the desired output in the training set example.
Depending on the direction of the error, adjust the weights slightly.
Repeat this process 10,000 times.
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Eventually the weights of the neuron will reach an optimum for the training set. If we allow the neuron to think about a new situation, that follows the same pattern, it should make a good prediction.
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Example borrowed from: How to build a simple neural network in 9 lines of Python code
The gradient of the Sigmoid curve, can be found by taking the derivative (remember calculus?)
So by substituting the second equation into the first equation (from two slides ago), the final formula for adjusting the weights is:
There are other, more advanced formulas, but this one is pretty simple.
Example borrowed from: How to build a simple neural network in 9 lines of Python code
Finally, Python!
We will use the numpy module, which is a mathematics library for Python.
We want to use four methods:
array() creates list-like arrays that are faster than regular lists. E.g., for the training set we saw earlier:
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]]).T
The ‘.T’ function, transposes the matrix from horizontal to vertical. So the computer is storing the numbers like this:
Example borrowed from: How to build a simple neural network in 9 lines of Python code
In 10 lines of Python code:
from numpy import exp, array, random, dot
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]]).T
random.seed(1)
synaptic_weights = 2 * random.random((3, 1)) - 1
for iteration in range(10000):
output = 1 / (1 + exp(-(dot(training_set_inputs, synaptic_weights))))
synaptic_weights += dot(training_set_inputs.T, (training_set_outputs - output)
* output * (1 - output))
print 1 / (1 + exp(-(dot(array([1, 0, 0]), synaptic_weights))))
Example borrowed from: How to build a simple neural network in 9 lines of Python code
With comments, and in a Class:
from numpy import exp, array, random, dot
class NeuralNetwork():
def __init__(self):
# Seed the random number generator, so it generates the same numbers
# every time the program runs.
random.seed(1)
# We model a single neuron, with 3 input connections and 1 output connection.
# We assign random weights to a 3 x 1 matrix, with values in the range -1 to 1
# and mean 0.
self.synaptic_weights = 2 * random.random((3, 1)) - 1
# The Sigmoid function, which describes an S shaped curve.
# We pass the weighted sum of the inputs through this function to
# normalise them between 0 and 1.
def __sigmoid(self, x):
return 1 / (1 + exp(-x))
# The derivative of the Sigmoid function.
# This is the gradient of the Sigmoid curve.
# It indicates how confident we are about the existing weight.
def __sigmoid_derivative(self, x):
return x * (1 - x)
# We train the neural network through a process of trial and error.
# Adjusting the synaptic weights each time.
def train(self, training_set_inputs, training_set_outputs, number_of_training_iterations):
for iteration in range(number_of_training_iterations):
# Pass the training set through our neural network (a single neuron).
output = self.think(training_set_inputs)
# Calculate the error (The difference between the desired output
# and the predicted output).
error = training_set_outputs - output
# Multiply the error by the input and again by the gradient of the Sigmoid curve.
# This means less confident weights are adjusted more.
# This means inputs, which are zero, do not cause changes to the weights.
adjustment = dot(training_set_inputs.T, error * self.__sigmoid_derivative(output))
# Adjust the weights.
self.synaptic_weights += adjustment
# The neural network thinks.
def think(self, inputs):
# Pass inputs through our neural network (our single neuron).
return self.__sigmoid(dot(inputs, self.synaptic_weights))
if __name__ == "__main__":
#Intialise a single neuron neural network.
neural_network = NeuralNetwork()
print("Random starting synaptic weights: ")
print(neural_network.synaptic_weights)
# The training set. We have 4 examples, each consisting of 3 input values
# and 1 output value.
training_set_inputs = array([[0, 0, 1], [1, 1, 1], [1, 0, 1], [0, 1, 1]])
training_set_outputs = array([[0, 1, 1, 0]]).T
# Train the neural network using a training set.
# Do it 10,000 times and make small adjustments each time.
neural_network.train(training_set_inputs, training_set_outputs, 10000)
print("New synaptic weights after training: ")
print(neural_network.synaptic_weights)
# Test the neural network with a new situation.
print("Considering new situation [1, 0, 0] -> ?: ")
print(neural_network.think(array([1, 0, 0])))
Too small! Let's do this in PyCharm
Example borrowed from: How to build a simple neural network in 9 lines of Python code
When we run the code, we get something like this:
Random starting synaptic weights:
[[-0.16595599]
[ 0.44064899]
[-0.99977125]]
New synaptic weights after training:
[[ 9.67299303]
[-0.2078435 ]
[-4.62963669]]
Considering new situation [1, 0, 0] -> ?:
[ 0.99993704]
https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html
right. The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size (pretty small and blurry!)
To train the classifier, we will do the following steps in order:
First, we'll load the data:
import torch
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
We can show some of the images:
import matplotlib.pyplot as plt
import numpy as np
# functions to show an image
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))
You can see that they are pretty blurry. They are:
car dog truck cat
PyTorch lets you define a neural network with some defaults:
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
We can also define a loss and Sigmoid function, as we saw before:
Now we just loop over the data and inputs, and we're training!
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
This was a simple, 2-iteration loop (on line 1)
We can save the data:
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
dataiter = iter(testloader)
images, labels = dataiter.next()
# print images
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
GroundTruth:
cat ship ship plane
We can load back the data and then see how our model does:
net = Net()
net.load_state_dict(torch.load(PATH))
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
We can look at how the network behaves on the whole dataset:
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))
Accuracy of the network on the 10000 test images: 53 %
We can see which categories did well:
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % (
classes[i], 100 * class_correct[i] / class_total[i]))
GroundTruth: cat ship ship plane
Predicted: dog ship ship ship
Accuracy of the network on the 10000 test images: 55 %
Accuracy of plane : 67 %
Accuracy of car : 72 %
Accuracy of bird : 36 %
Accuracy of cat : 18 %
Accuracy of deer : 45 %
Accuracy of dog : 60 %
Accuracy of frog : 64 %
Accuracy of horse : 62 %
Accuracy of ship : 65 %
Accuracy of truck : 65 %
2 loops:
GroundTruth: cat ship ship plane
Predicted: cat ship plane plane
Accuracy of the network on the
10000 test images: 61 %
Accuracy of plane : 66 %
Accuracy of car : 77 %
Accuracy of bird : 47 %
Accuracy of cat : 33 %
Accuracy of deer : 63 %
Accuracy of dog : 53 %
Accuracy of frog : 71 %
Accuracy of horse : 54 %
Accuracy of ship : 74 %
Accuracy of truck : 72 %
GroundTruth: cat ship ship plane
Predicted: dog ship ship ship
Accuracy of the network on the
10000 test images: 55 %
Accuracy of plane : 67 %
Accuracy of car : 72 %
Accuracy of bird : 36 %
Accuracy of cat : 18 %
Accuracy of deer : 45 %
Accuracy of dog : 60 %
Accuracy of frog : 64 %
Accuracy of horse : 62 %
Accuracy of ship : 65 %
Accuracy of truck : 65 %