How Do Machines See?

ARCHIVE

How Do Machines See?

ISCHOOL VISUAL IDENTITY

#002554

#fed141

#007396

#382F2D

#3eb1c8

#f9423a

#C8102E

NOTES

REFERENCES/RESOURCES

IMAGES

Teasers

https://innoeduvation.org/313/vision/tm/

https://innoeduvation.org/313/vision/tmAnimals/index.html

Teachable Machine References

Outline

Outline

What is Vision?

The Sciences of Making Machines That See

The Phenomenology of Networks

Machine Vision by Stepwise Refinement

Neurons, Real and Artificial

Activation Functions

Automation

Activation Functions Redux

Logic Gates

Going Deep

Teachable Machine

Networks of Neurons

What's Going on Here?

What's Going on Here?

What is vision?

STOP+THINK:

Define the verb "to see"

      into a representation of

1) light

2) detection

6) external objects

3) multiple stimuli are processed

4) assembled

5) properties

        that are processed by the brain
to receive light stimuli
the position, shape, brightness, color of objects in space

STOP+THINK:

Define the verb "to see"

Text

properties

external objects

representation of

multiple stimuli are processed

assembled

detection

1) light

Studying Vision in Fruit Flies

Questions?

How Do Machines See
is a big field of study

Visualizing the Fields of Seeing Machines

Machine Vision by Stepwise Refinement

  1. Existence Detection
    • e.g., is there a wall ahead?
  2. Classification
    • e.g., is this a dog or a cat?
  3. Recognition
    • e.g., is this the sheep named Alois?
  4. Sequence Detection
    • e.g., was that the ASL sign for "peace"?

1

질문이 있습니까?

But what's in the black box?

Neurons

Real and Artificial

2

Neuron Schematic

Neuron Preserved

Neuron as Shocked and Shocking!

dendrites

axon

nucleus

Neuron Anatomy

A Network of Trusted Friends

YOU

bit of the world

F01

F03

F06

F02

F04

F05

bit of the world

bit of the world

bit of the world

bit of the world

bit of the world

bit of the world

+        +        = OK, then:

Every neuron is a function mapping an arbitrary set of binary inputs to a single binary output

7 of ten friends recommend going to the party instead of studying!

All Contacts are Equal!

Some Contacts are More Equal than Others

Go!

Go!

Go!

Hmm...

Hmm...

We "weight" our inputs based on how important they are ...

or how dependable they are.

Some Contacts are More Equal than Others

Compute the Weighted Sum of the Inputs

Sumweighted = Input1 x weight1 + input2 x weight2 + ... + inputn x weightn

des questions?

So Far...

An artificial neuron

  1. takes 1 or more binary inputs
  2. weights each one
  3. sums them up

But how does it decide whether or not to "fire"?

Activation Functions

Everyone's Got a Threshold!

if (weightedSum < threshold)
    do nothing
else
    FIRE!

Everyone's Got a Threshold!

if (weightedSum < threshold)
    do nothing
else
    FIRE!

weighted sum of inputs

0     1     2    3    4    5    6    7    8    9

output

1

 

 0

threshold

STEP FUNCTION

Every neuron has an ACTIVATION FUNCTION

Activation Functions

Activation Functions

STOP+THINK

Here we are on Zoom. Inputs are others turning on their video. Weights are 1 and your threshold is 4. Do you turn your video on?

Activation Functions in Everyday Life

अब तक कोई सवाल?

STOP+THINK

Why do we sometimes take advice from someone "with a grain of salt"?

The Secret: You can change the weights

Think about it: if you get bad advice from one of your friends, how does that affect how you weight their advice next time?

 

What if you have a friend who ALWAYS seems to give good advice?

Feedback Again!

Putting it all together

Putting it all together

A three by three grid of "pixels"

Putting it all together

Think of the grid in one dimension

STOP +THINK

What does this pattern look like in one dimension?

STOP +THINK

What does this pattern look like in one dimension?

Putting it all together

Have three neurons that take the pixels as "inputs"

Putting it all together

Redraw to take advantage of the slide orientation

Putting it all together

Add a neuron that takes the outputs of the first three neurons as its inputs

How many weights?

STOP +THINK

Na

STOP +THINK

Nb

Nc

Nd

P1

P3

P4

P6

P7

P9

P2

P5

P8

w1a w2a w3a w4a w5a w6a w7a w8a w9a

w1b w2b w3b w4b w5b w6b w7b w8b w9b

w1c w2c w3c w4c w5c w6c w7c w8c w9c

wad wbd wcd

Na

Nb

Nc

Nd

P1

P3

P4

P6

P7

P9

P2

P5

P8

w1a w2a w3a w4a w5a w6a w7a w8a w9a

w1b w2b w3b w4b w5b w6b w7b w8b w9b

w1c w2c w3c w4c w5c w6c w7c w8c w9c

wad wbd wcd

"THE MODEL"

任何问题

Can We Automate This?

Could the network teach itself?

 

  1. give it random weights
  2. show it an example
  3. ask it to guess
  4. reward or punish it depending on result

WDTM?

reward or punish it depending on result

inputs

outputs

truth

feedback

Basic Idea: FEEDBACK

Feedback

Adjust each weight by some amount, a "delta" or difference depending on whether the sending node gave good advice (that is, got it right).

So Far

An artificial neuron converts a weighted sum of multiple inputs into a single output

A perceptron is a network of artificial neurons that can detect visual patterns

A perceptron is tuned by adjusting weights so it yields expected output for each input pattern.

"Learning" happens by increasing weights for neurons that "get it right" and decreasing weights for neurons that "get it wrong."

A system can be set to react more or less strongly to each learning experience.

"Learning" happens via feedback. The error or loss is the difference between the output and the "ground truth."

Haben Sie Fragen?

Activation Functions Redux

SIGMOID ACTIVATION FUNCTION

RECTIFIED LINEAR UNIT (ReLU) ACTIVATION FUNCTION

ReLU is very common activation function

  1. Sum the inputs
  2. If negative return 0
  3. Otherwise return sum

Preguntas?

Going Deep

hidden layer

inputs

inputs

output

output

Outputs as Percentages

Explain "confidence"

So far...

classes

labels

training data

error/loss

weights

predictions

how much attention does a neuron pay to each of its inputs?

in a classification model, the "bins" into which we put items we have classified

the outputs when a model is run on new data

how much we are getting wrong

the "correct" answers that are used to train a model

samples that we "show" the model so it can adjust its weights based on whether it gets it right or wrong

Google's Teachable Machine

Let's Try It

Teachable Machine Workflow

Teachable Machine FILES

Teachable Machine FILES

Teachable Machine FILES

END

Text

<iframe height="300" style="width: 100%;" scrolling="no" title="" src="https://codepen.io/team/DanR/embed/wvdrORW?default-tab=js%2Cresult&editable=true&theme-id=light" frameborder="no" loading="lazy" allowtransparency="true" allowfullscreen="true" allow="geolocation *; microphone *; camera *; midi *; encrypted-media *">
  See the Pen <a href="https://codepen.io/team/DanR/pen/wvdrORW">
  </a> by innoeduvation(OLD) (<a href="https://codepen.io/team/DanR">@DanR</a>)
  on <a href="https://codepen.io">CodePen</a>.
</iframe>

Extra

I also have to doctor the CodePen embed a little.  Here's what CodePen provides:

<iframe style="width: 100%;" title="Teachable Machine Example 01" src="https://codepen.io/team/DanR/embed/oNxLLyE?height=415&amp;theme-id=light&amp;default-tab=js,result&amp;editable=true" height="415" allowfullscreen="allowfullscreen">
See the Pen <a href="https://codepen.io/team/DanR/pen/oNxLLyE">Teachable Machine Example 01</a> by innoeduvation(OLD)
(<a href="https://codepen.io/DanR">@DanR</a>) on <a href="https://codepen.io">CodePen</a>.</iframe>

And here's what I have to add to the <iframe> element after "allowfullscreen":

 allow="geolocation *; microphone *; camera *; midi *; encrypted-media *"

allow="geolocation *; microphone *; camera *; midi *; encrypted-media *"

Could a perceptron replace logic gates?

Subideas

  • The "delta" should depend on the error
  • The "delta" should depend on how responsive we want to be to new information
wnew = wold + learnRate * error

Linear Algebra

\begin{bmatrix} w_{1,3} & w_{1,3} & w_{1,3}\\ w_{2,3} & w_{2,3} & w_{2,3}\\ w_{3,3} & w_{3,3} & w_{3,3} \end{bmatrix}
\begin{bmatrix} w_{1,1,3} & w_{1,1,3} & w_{1,1,3}\\ w_{1,2,3} & w_{1,2,3} & w_{1,2,3}\\ w_{1,3,3} & w_{1,3,3} & w_{1,3,3} \end{bmatrix}
\begin{bmatrix} w_{3,1,3} & w_{3,1,3} & w_{3,1,3}\\ w_{3,2,3} & w_{3,2,3} & w_{3,2,3}\\ w_{3,3,3} & w_{3,3,3} & w_{3,3,3} \end{bmatrix}
\begin{bmatrix} w_{2,1,3} & w_{2,1,3} & w_{2,1,3}\\ w_{2,2,3} & w_{2,2,3} & w_{2,2,3}\\ w_{2,3,3} & w_{2,3,3} & w_{2,3,3} \end{bmatrix}

Matrix

Tensor

\begin{bmatrix} w_{1} & w_{2} & w_{3} \end{bmatrix}

Matrix

Tensor

\begin{bmatrix} \begin{bmatrix} w_{1} & w_{2} & w_{3} \end{bmatrix} \\ \begin{bmatrix} w_{1} & w_{2} & w_{3} \end{bmatrix} \\ \begin{bmatrix} w_{1} & w_{2} & w_{3} \end{bmatrix} \end{bmatrix}
w

Number (scalar)

Array (vector)

\begin{bmatrix} w_{1,1,3} & w_{1,1,3} & w_{1,1,3}\\ w_{1,2,3} & w_{1,2,3} & w_{1,2,3}\\ w_{1,3,3} & w_{1,3,3} & w_{1,3,3} \end{bmatrix}
\begin{bmatrix} w_{3,1,3} & w_{3,1,3} & w_{3,1,3}\\ w_{3,2,3} & w_{3,2,3} & w_{3,2,3}\\ w_{3,3,3} & w_{3,3,3} & w_{3,3,3} \end{bmatrix}
\begin{bmatrix} w_{2,1,3} & w_{2,1,3} & w_{2,1,3}\\ w_{2,2,3} & w_{2,2,3} & w_{2,2,3}\\ w_{2,3,3} & w_{2,3,3} & w_{2,3,3} \end{bmatrix}

More on Tensors (optional)

Tutorial

YOUR MODEL

  1. Teachable Machine

  2. Codepen "Projects"

  3. Classification Model

Codepen "Projects"

A workspace where all the files of a web development project can be kept in an arrangement that mimics what we would do for creating a website.

Codepen "Projects"

FORKING a project means copying all the files to your own account so you can continue developing (on a different fork in the road)

original

your fork

Languages are Forks

Outtakes

Archive copy of How do machines see?

By Dan Ryan

Archive copy of How do machines see?

  • 196