How Do Machines See?

NOTES

Google Drive

REFERENCES/RESOURCES

Google Drive

Teachable Machine References

IMAGES

Google Drive

ISCHOOL VISUAL IDENTITY

#002554

#fed141

#007396

#382F2D

#3eb1c8

#f9423a

#C8102E

Outline

What is Vision?

The Sciences of Making Machines That See

The Phenomenology of Networks

Machine Vision by Stepwise Refinement

Neurons, Real and Artificial

Activation Functions

Automation

Activation Functions Redux

Logic Gates

Going Deep

Teachable Machine

Tutorial

Open Google Doc for Questions: https://bit.ly/inf1339-w10Qs

Outline

What is Vision?

The Sciences of Making Machines That See

The Phenomenology of Networks

Machine Vision by Stepwise Refinement

Neurons, Real and Artificial

Activation Functions

Automation

Activation Functions Redux

Logic Gates

Going Deep

Teachable Machine

How Do Machines See?

Teasers

https://innoeduvation.org/313/vision/tm/

https://innoeduvation.org/313/vision/tmAnimals/index.html

https://innoeduvation.org/313/vision/maze.html

https://teachablemachine.withgoogle.com/models/nUKbQ-rI4/

Outline

Vision
Black box
Neurons
Black box full of neurons (neural network)
Next Time: Teachable Machine (next Th at 4?)

How Do Machines See?

?

INPUT

OUTPUT

How Do Machines See?

?

INPUT

OUTPUT

What is vision?

STOP+THINK:

Define the verb "to see"

      into a representation of

1) light

2) detection

6) external objects

3) multiple stimuli are processed

4) assembled

5) properties

        that are processed by the brain

to receive light stimuli

the position, shape, brightness, color of objects in space

STOP+THINK:

Define the verb "to see"

      into a representation of

1) light

2) detection

6) external objects

3) multiple stimuli are processed

4) assembled

5) properties

        that are processed by the brain

to receive light stimuli

the position, shape, brightness, color of objects in space

STOP+THINK:

Define the verb "to see"

Text

light

detection

properties

external objects

representation of

multiple stimuli are processed

and assembled

Studying Vision in Fruit Flies

An image is just a bunch of data

Questions?

How Do Machines See
is a big field of study

Visualizing the Fields of Seeing Machines

Existence Detection
- e.g., is there a wall ahead?
Classification
- e.g., is this a dog or a cat?
Recognition
- e.g., is this the sheep named Alois?
Sequence Detection
- e.g., was that the ASL sign for "peace"?

1

질문이 있습니까?

But what's in the black box?

Neurons

Real and Artificial

2 What's Going on Here?

What's Going on Here?

Neuron Schematic

Neurons at Work

"New method visualizes groups of neurons as they compute" Anne Trafton | MIT News Office October 9, 2019

Neuron as Shocked and Shocking!

dendrites

axon

nucleus

Neuron Anatomy

A Network of Trusted Friends

YOU

bit of the world

F01

F03

F06

F02

F04

F05

bit of the world

+ + = OK, then:

Every neuron is a function mapping an arbitrary set of binary inputs to a single binary output

7 of ten friends recommend going to the party instead of studying!

All Contacts are Equal!

Some Contacts are More Equal than Others

Go!

Hmm...

We "weight" our inputs based on how important they are ...

or how dependable they are.

Some Contacts are More Equal than Others

Compute the Weighted Sum of the Inputs

Sum_weighted = Input₁ x weight₁ + input₂ x weight₂ + ... + input_n x weight_n

des questions?

So Far...

An artificial neuron

takes 1 or more binary inputs
weights each one
sums them up

But how does it decide whether or not to "fire"?

Activation Functions

Everyone's Got a Threshold!

if (weightedSum < threshold)
    do nothing
else
    FIRE!

Everyone's Got a Threshold!

if (weightedSum < threshold)
    do nothing
else
    FIRE!

weighted sum of inputs

0 1 2 3 4 5 6 7 8 9

output

threshold

STEP FUNCTION

Every neuron has an ACTIVATION FUNCTION

Activation Functions

Activation Functions MORE GENERALLY

any function that maps weighted sum to decision to neuron output

STOP+THINK

Here we are on Zoom. Inputs are others turning on their video. Weights are 1 and your threshold is 4. Do you turn your video on?

Activation Functions in Everyday Life

अब तक कोई सवाल?

STOP+THINK

Why do we sometimes take advice from someone "with a grain of salt"?

The Secret: You can change the weights

Think about it: if you get bad advice from one of your friends, how does that affect how you weight their advice next time?

What if you have a friend who ALWAYS seems to give good advice?

Feedback Again!

Putting it all together

A three by three grid of "pixels"

Putting it all together

Think of the grid in one dimension

STOP +THINK

What does this pattern look like in one dimension?

STOP +THINK

What does this pattern look like in one dimension?

Putting it all together

Have three neurons that take the pixels as "inputs"

Putting it all together

Redraw to take advantage of the slide orientation

Putting it all together

Add a neuron that takes the outputs of the first three neurons as its inputs

How many weights?

STOP +THINK

w_1a w_2a w_3a w_4a w_5a w_6a w_7a w_8a w_9a

w_1b w_2b w_3b w_4b w_5b w_6b w_7b w_8b w_9b

w_1c w_2c w_3c w_4c w_5c w_6c w_7c w_8c w_9c

w_ad w_bd w_cd

w_1a w_2a w_3a w_4a w_5a w_6a w_7a w_8a w_9a

w_1b w_2b w_3b w_4b w_5b w_6b w_7b w_8b w_9b

w_1c w_2c w_3c w_4c w_5c w_6c w_7c w_8c w_9c

w_ad w_bd w_cd

"THE MODEL"

w_1a w_2a w_3a w_4a w_5a w_6a w_7a w_8a w_9a

w_1b w_2b w_3b w_4b w_5b w_6b w_7b w_8b w_9b

w_1c w_2c w_3c w_4c w_5c w_6c w_7c w_8c w_9c

w_ad w_bd w_cd

"THE MODEL"

weighted sum of inputs at Node A

p₁ x w_1a + p₂ x w_2a + p₃ x w_3a + p₄ x w_4a + p₅ x w_5a + p₆ x w_6a + p₇ x w_7a + p₈ x w_8a + p₉ x w_9a

Shorthand for this

P₁ x W_A

w_1a w_2a w_3a w_4a w_5a w_6a w_7a w_8a w_9a

w_1b w_2b w_3b w_4b w_5b w_6b w_7b w_8b w_9b

w_1c w_2c w_3c w_4c w_5c w_6c w_7c w_8c w_9c

w_ad w_bd w_cd

"THE MODEL"

0.2

1.1

0.2

1.5

0.2

0.5

0.6

任何问题

Can We Automate This?

Could the network teach itself?

give it random weights
show it an example
ask it to guess
reward or punish it depending on result

WDTM?

increase weights on neurons that gave good advice, decrease weights on neurons that ga ve bad advice

"reward"
or
"punish"

inputs

outputs

truth

feedback

Basic Idea: FEEDBACK

Feedback CHOICE

We can react more or less in response to error.

This is called the "learning rate."

Activation Functions Redux

SIGMOID ACTIVATION FUNCTION

RECTIFIED LINEAR UNIT (ReLU) ACTIVATION FUNCTION

ReLU is very common activation function

```
Sum the inputs
```
```
If negative return 0
```
```
Otherwise return sum
```

So Far

An artificial neuron converts a weighted sum of multiple inputs into a single output

A perceptron is a network of artificial neurons that can detect visual patterns

A perceptron is tuned by adjusting weights so it yields expected output for each input pattern.

"Learning" happens by increasing weights for neurons that "get it right" and decreasing weights for neurons that "get it wrong."

A system can be set to react more or less strongly to each learning experience.

"Learning" happens via feedback. The error or loss is the difference between the output and the "ground truth."

Haben Sie Fragen?

Preguntas?

Going Deep

hidden layer

inputs

output

Supervised Learning

show machine "labeled" training data

an image and its "class"

let machine make a guess or "prediction"

algorithm adjusts weights based on error

repeat until error is small

trained model can be used "in the wild"

Supervised Learning

for i=1 to 100

read input

predict

random Ws

compute average loss

adjust Ws

error small or time run out?

test

Jay Huang, CC BY 2.0,
via Wikimedia Commons

Suppose you find yourself in this landscape and it is hard to breathe. You must get to the lowest possible elevation. What do you do?

The weights in a model define a high dimensional "space"

The model's loss at each point in weight space defines a surface

Supervised Learning

for i=1 to 100

read input

predict

random Ws

compute average loss

adjust Ws

error small or time run out?

test

On a hill, an arrow that points "uphill" and whose length represents steepness is called the GRADIENT

A learning algorithm that moves downhill on the loss function surface is called GRADIENT DESCENT

ONE MORE THING

We call the "answers" we want from the machine "classes"

and the outputs are typically probabilities or "confidence"

"I'm 80% confident that this is a bear..."

...but there's a 20% chance it's a lion."

Outputs as probabilities

interpreted as "confidence"

So far...

classes

labels

training data

error/loss

weights

predictions

how much attention does a neuron pay to each of its inputs?

in a classification model, the "bins" into which we put items we have classified

the outputs when a model is run on new data

how much we are getting wrong

the "correct" answers that are used to train a model

samples that we "show" the model so it can adjust its weights based on whether it gets it right or wrong

Google's Teachable Machine

https://youtu.be/T2qQGqZxkD0

Teachable Machine: Are You Wearing Glasses?

Let's Try It

https://teachablemachine.withgoogle.com/

Teachable Machine Workflow

Teachable Machine FILES

Teachable Machine in CodePen

<div>Teachable Machine Image Model</div>
<button type="button" onclick="init()">Start</button>
<div id="webcam-container"></div>
<div id="label-container"></div>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script>
<script type="text/javascript">
    // More API functions here:
    // https://github.com/googlecreativelab/teachablemachine-community/tree/master/libraries/image

    // the link to your model provided by Teachable Machine export panel
    const URL = "https://teachablemachine.withgoogle.com/models/nUKbQ-rI4/";

    let model, webcam, labelContainer, maxPredictions;

    // Load the image model and setup the webcam
    async function init() {
        const modelURL = URL + "model.json";
        const metadataURL = URL + "metadata.json";

        // load the model and metadata
        // Refer to tmImage.loadFromFiles() in the API to support files from a file picker
        // or files from your local hard drive
        // Note: the pose library adds "tmImage" object to your window (window.tmImage)
        model = await tmImage.load(modelURL, metadataURL);
        maxPredictions = model.getTotalClasses();

        // Convenience function to setup a webcam
        const flip = true; // whether to flip the webcam
        webcam = new tmImage.Webcam(200, 200, flip); // width, height, flip
        await webcam.setup(); // request access to the webcam
        await webcam.play();
        window.requestAnimationFrame(loop);

        // append elements to the DOM
        document.getElementById("webcam-container").appendChild(webcam.canvas);
        labelContainer = document.getElementById("label-container");
        for (let i = 0; i < maxPredictions; i++) { // and class labels
            labelContainer.appendChild(document.createElement("div"));
        }
    }

    async function loop() {
        webcam.update(); // update the webcam frame
        await predict();
        window.requestAnimationFrame(loop);
    }

    // run the webcam image through the image model
    async function predict() {
        // predict can take in an image, video or canvas html element
        const prediction = await model.predict(webcam.canvas);
        for (let i = 0; i < maxPredictions; i++) {
            const classPrediction =
                prediction[i].className + ": " + prediction[i].probability.toFixed(2);
            labelContainer.childNodes[i].innerHTML = classPrediction;
        }
    }
</script>

Teachable Machine in CodePen

<div>Teachable Machine Image Model</div>
<button type="button" onclick="init()">Start</button>
<div id="webcam-container"></div>
<div id="label-container"></div>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@1.3.1/dist/tf.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/@teachablemachine/image@0.8/dist/teachablemachine-image.min.js"></script>
<script type="text/javascript">
    //JAVASCRIPT
</script>

Teachable Machine in CodePen

HTML

// the link to your model provided by Teachable Machine export panel
const URL = "https://teachablemachine.withgoogle.com/models/nUKbQ-rI4/";

let model, webcam, labelContainer, maxPredictions;

// Load the image model and setup the webcam
async function init() {
const modelURL = URL + "model.json";
const metadataURL = URL + "metadata.json";

// load the model and metadata
// Refer to tmImage.loadFromFiles() in the API to support files from a file picker
// or files from your local hard drive
// Note: the pose library adds "tmImage" object to your window (window.tmImage)
model = await tmImage.load(modelURL, metadataURL);
maxPredictions = model.getTotalClasses();

// Convenience function to setup a webcam
const flip = true; // whether to flip the webcam
webcam = new tmImage.Webcam(200, 200, flip); // width, height, flip
await webcam.setup(); // request access to the webcam
await webcam.play();
window.requestAnimationFrame(loop);

// append elements to the DOM
document.getElementById("webcam-container").appendChild(webcam.canvas);
labelContainer = document.getElementById("label-container");
for (let i = 0; i < maxPredictions; i++) { // and class labels
labelContainer.appendChild(document.createElement("div"));
}
}

async function loop() {
webcam.update(); // update the webcam frame
await predict();
window.requestAnimationFrame(loop);
}

// run the webcam image through the image model
async function predict() {
// predict can take in an image, video or canvas html element
const prediction = await model.predict(webcam.canvas);
for (let i = 0; i < maxPredictions; i++) {
const classPrediction =
prediction[i].className + ": " + prediction[i].probability.toFixed(2);
labelContainer.childNodes[i].innerHTML = classPrediction;
}
}
</script>

How Do Machines See?

NOTES

REFERENCES/RESOURCES

IMAGES

ISCHOOL VISUAL IDENTITY

Outline

Outline

Teasers

Outline

How Do Machines See?

?

?

INPUT

OUTPUT

How Do Machines See?

?

?

INPUT

OUTPUT

What is vision?

What is vision?

STOP+THINK:

Define the verb "to see"

STOP+THINK:

Define the verb "to see"

STOP+THINK:

Define the verb "to see"

Studying Vision in Fruit Flies

An image is just a bunch of data

Questions?

How Do Machines See is a big field of study

Visualizing the Fields of Seeing Machines

Machine Vision by Stepwise Refinement

1

질문이 있습니까?

But what's in the black box?

Neurons

Real and Artificial

2

What's Going on Here?

What's Going on Here?

Neuron Schematic

Neurons at Work

Neuron as Shocked and Shocking!

Neuron Anatomy

A Network of Trusted Friends

All Contacts are Equal!

Some Contacts are More Equal than Others

Some Contacts are More Equal than Others

Compute the Weighted Sum of the Inputs

des questions?

But how does it decide whether or not to "fire"?

Activation Functions

Everyone's Got a Threshold!

Everyone's Got a Threshold!

STEP FUNCTION

Every neuron has an ACTIVATION FUNCTION

Activation Functions

Activation Functions

Activation Functions MORE GENERALLY

STOP+THINK

Activation Functions in Everyday Life

अब तक कोई सवाल?

STOP+THINK

Why do we sometimes take advice from someone "with a grain of salt"?

The Secret: You can change the weights

Feedback Again!

Putting it all together

Putting it all together

Putting it all together

STOP +THINK

STOP +THINK

Putting it all together

Putting it all together

Putting it all together

STOP +THINK

STOP +THINK

"THE MODEL"

"THE MODEL"

weighted sum of inputs at Node A

How Do Machines See
is a big field of study

"reward"
or
"punish"