How Do Machines See?
Teachable Machine References
#002554
#fed141
#007396
#382F2D
#3eb1c8
#f9423a
#C8102E
What is Vision?
The Sciences of Making Machines That See
The Phenomenology of Networks
Machine Vision by Stepwise Refinement
Neurons, Real and Artificial
Activation Functions
Automation
Activation Functions Redux
Logic Gates
Going Deep
Teachable Machine
How Do Machines See?
into a representation of
1) light
2) detection
6) external objects
3) multiple stimuli are processed
4) assembled
5) properties
that are processed by the brain
to receive light stimuli
the position, shape, brightness, color of objects in space
into a representation of
1) light
2) detection
6) external objects
3) multiple stimuli are processed
4) assembled
5) properties
that are processed by the brain
to receive light stimuli
the position, shape, brightness, color of objects in space
Text
light
detection
properties
external objects
representation of
multiple stimuli are processed
and assembled
"New method visualizes groups of neurons as they compute" Anne Trafton | MIT News Office October 9, 2019
dendrites
axon
nucleus
YOU
bit of the world
F01
F03
F06
F02
F04
F05
bit of the world
bit of the world
bit of the world
bit of the world
bit of the world
bit of the world
+ + = OK, then:
Every neuron is a function mapping an arbitrary set of binary inputs to a single binary output
7 of ten friends recommend going to the party instead of studying!
Go!
Go!
Go!
Hmm...
Hmm...
We "weight" our inputs based on how important they are ...
or how dependable they are.
Sumweighted = Input1 x weight1 + input2 x weight2 + ... + inputn x weightn
So Far...
An artificial neuron
if (weightedSum < threshold) do nothing else FIRE!
if (weightedSum < threshold) do nothing else FIRE!
weighted sum of inputs
0 1 2 3 4 5 6 7 8 9
output
1
0
threshold
any function that maps weighted sum to decision to neuron output
Here we are on Zoom. Inputs are others turning on their video. Weights are 1 and your threshold is 4. Do you turn your video on?
Think about it: if you get bad advice from one of your friends, how does that affect how you weight their advice next time?
What if you have a friend who ALWAYS seems to give good advice?
A three by three grid of "pixels"
Think of the grid in one dimension
What does this pattern look like in one dimension?
What does this pattern look like in one dimension?
Have three neurons that take the pixels as "inputs"
Redraw to take advantage of the slide orientation
Add a neuron that takes the outputs of the first three neurons as its inputs
How many weights?
Na
Nb
Nc
Nd
P1
P3
P4
P6
P7
P9
P2
P5
P8
w1a w2a w3a w4a w5a w6a w7a w8a w9a
w1b w2b w3b w4b w5b w6b w7b w8b w9b
w1c w2c w3c w4c w5c w6c w7c w8c w9c
wad wbd wcd
w1a w2a w3a w4a w5a w6a w7a w8a w9a
w1b w2b w3b w4b w5b w6b w7b w8b w9b
w1c w2c w3c w4c w5c w6c w7c w8c w9c
wad wbd wcd
Na
Nb
Nc
Nd
P1
P3
P4
P6
P7
P9
P2
P5
P8
w1a w2a w3a w4a w5a w6a w7a w8a w9a
w1b w2b w3b w4b w5b w6b w7b w8b w9b
w1c w2c w3c w4c w5c w6c w7c w8c w9c
wad wbd wcd
Na
Nb
Nc
Nd
P1
P3
P4
P6
P7
P9
P2
P5
P8
p1 x w1a + p2 x w2a + p3 x w3a + p4 x w4a + p5 x w5a + p6 x w6a + p7 x w7a + p8 x w8a + p9 x w9a
P1 x WA
w1a w2a w3a w4a w5a w6a w7a w8a w9a
w1b w2b w3b w4b w5b w6b w7b w8b w9b
w1c w2c w3c w4c w5c w6c w7c w8c w9c
wad wbd wcd
Na
Nb
Nc
Nd
P1
P3
P4
P6
P7
P9
P2
P5
P8
P1
P3
P4
P6
P7
P9
P2
P5
P8
P1
P3
P4
P6
P7
P9
P2
P5
P8
1
0
1
1
1
1
1
0
0
0.2
1
0
1
1
1
1
1
0
0
1.1
0
0.2
1.5
0.2
0.5
0.6
0
0
Could the network teach itself?
inputs
outputs
truth
feedback
We can react more or less in response to error.
This is called the "learning rate."
SIGMOID ACTIVATION FUNCTION
RECTIFIED LINEAR UNIT (ReLU) ACTIVATION FUNCTION
Sum the inputs
If negative return 0
Otherwise return sum
An artificial neuron converts a weighted sum of multiple inputs into a single output
A perceptron is a network of artificial neurons that can detect visual patterns
A perceptron is tuned by adjusting weights so it yields expected output for each input pattern.
"Learning" happens by increasing weights for neurons that "get it right" and decreasing weights for neurons that "get it wrong."
A system can be set to react more or less strongly to each learning experience.
"Learning" happens via feedback. The error or loss is the difference between the output and the "ground truth."
hidden layer
inputs
inputs
output
output
for i=1 to 100
read input
predict
random Ws
compute average loss
adjust Ws
error small or time run out?
test
for i=1 to 100
read input
predict
random Ws
compute average loss
adjust Ws
error small or time run out?
test
classes
labels
training data
error/loss
weights
predictions
how much attention does a neuron pay to each of its inputs?
in a classification model, the "bins" into which we put items we have classified
the outputs when a model is run on new data
how much we are getting wrong
the "correct" answers that are used to train a model
samples that we "show" the model so it can adjust its weights based on whether it gets it right or wrong
<div>Teachable Machine Image Model</div>
<button type="button" onclick="init()">Start</button>
<div id="webcam-container"></div>
<div id="label-container"></div>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script>
<script type="text/javascript">
// More API functions here:
// https://github.com/googlecreativelab/teachablemachine-community/tree/master/libraries/image
// the link to your model provided by Teachable Machine export panel
const URL = "https://teachablemachine.withgoogle.com/models/nUKbQ-rI4/";
let model, webcam, labelContainer, maxPredictions;
// Load the image model and setup the webcam
async function init() {
const modelURL = URL + "model.json";
const metadataURL = URL + "metadata.json";
// load the model and metadata
// Refer to tmImage.loadFromFiles() in the API to support files from a file picker
// or files from your local hard drive
// Note: the pose library adds "tmImage" object to your window (window.tmImage)
model = await tmImage.load(modelURL, metadataURL);
maxPredictions = model.getTotalClasses();
// Convenience function to setup a webcam
const flip = true; // whether to flip the webcam
webcam = new tmImage.Webcam(200, 200, flip); // width, height, flip
await webcam.setup(); // request access to the webcam
await webcam.play();
window.requestAnimationFrame(loop);
// append elements to the DOM
document.getElementById("webcam-container").appendChild(webcam.canvas);
labelContainer = document.getElementById("label-container");
for (let i = 0; i < maxPredictions; i++) { // and class labels
labelContainer.appendChild(document.createElement("div"));
}
}
async function loop() {
webcam.update(); // update the webcam frame
await predict();
window.requestAnimationFrame(loop);
}
// run the webcam image through the image model
async function predict() {
// predict can take in an image, video or canvas html element
const prediction = await model.predict(webcam.canvas);
for (let i = 0; i < maxPredictions; i++) {
const classPrediction =
prediction[i].className + ": " + prediction[i].probability.toFixed(2);
labelContainer.childNodes[i].innerHTML = classPrediction;
}
}
</script>
<div>Teachable Machine Image Model</div> <button type="button" onclick="init()">Start</button> <div id="webcam-container"></div> <div id="label-container"></div> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script> <script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script> <script type="text/javascript"> //JAVASCRIPT </script>
HTML
<script type="text/javascript">
// More API functions here:
// https://github.com/googlecreativelab/teachablemachine-community/tree/master/libraries/image
// the link to your model provided by Teachable Machine export panel
const URL = "https://teachablemachine.withgoogle.com/models/nUKbQ-rI4/";
let model, webcam, labelContainer, maxPredictions;
// Load the image model and setup the webcam
async function init() {
const modelURL = URL + "model.json";
const metadataURL = URL + "metadata.json";
// load the model and metadata
// Refer to tmImage.loadFromFiles() in the API to support files from a file picker
// or files from your local hard drive
// Note: the pose library adds "tmImage" object to your window (window.tmImage)
model = await tmImage.load(modelURL, metadataURL);
maxPredictions = model.getTotalClasses();
// Convenience function to setup a webcam
const flip = true; // whether to flip the webcam
webcam = new tmImage.Webcam(200, 200, flip); // width, height, flip
await webcam.setup(); // request access to the webcam
await webcam.play();
window.requestAnimationFrame(loop);
// append elements to the DOM
document.getElementById("webcam-container").appendChild(webcam.canvas);
labelContainer = document.getElementById("label-container");
for (let i = 0; i < maxPredictions; i++) { // and class labels
labelContainer.appendChild(document.createElement("div"));
}
}
async function loop() {
webcam.update(); // update the webcam frame
await predict();
window.requestAnimationFrame(loop);
}
// run the webcam image through the image model
async function predict() {
// predict can take in an image, video or canvas html element
const prediction = await model.predict(webcam.canvas);
for (let i = 0; i < maxPredictions; i++) {
const classPrediction =
prediction[i].className + ": " + prediction[i].probability.toFixed(2);
labelContainer.childNodes[i].innerHTML = classPrediction;
}
}
</script>