Intro to Machine Learning

Andrew Beam, PhD

Head of Machine Learning/Senior Fellow

Flagship Pioneering

June 5th, 2019

twitter: @AndrewLBeam

Review Articles

What is Artificial Intelligence?

What is Machine Learning?

Machine learning is a class of algorithms that learn how to do a task directly from data

Data = features (aka variables or inputs) and labels (aka the 'right' answer or outputs)

The algorithm is 'trained' to produce the correct output for a given input

What is Machine Learning?

Features

ML Algorithm

Label

Cat

Dog

What is Machine Learning?

Features

ML Algorithm

Label

“(Revenge of the Sith) marks a distinct improvement on the last two episodes, but only in the same way that dying from natural causes is preferable to crucifixion.”

Sentiment:

Negative

What is Machine Learning?

Features

ML Algorithm

Label

"I'm sorry Dave,

I can't do that"

What is Machine Learning?

Features

ML Algorithm

Label

0.96

Fluoresence

WHAT IS DEEP LEARNING?

Deep learning is a specific kind of machine learning

- Machine learning automatically learns relationships using data

- Deep learning refers to large neural networks

- These neural networks have millions of parameters and hundreds of layers (e.g. they are structurally deep)

- Most important: Deep learning is not magic!

WHY DEEP LEARNING?

DEEP LEARNING HAS MASTERED GO

Source: https://deepmind.com/blog/alphago-zero-learning-scratch/

Human data no longer needed

DEEP LEARNING HAS MASTERED GO

DEEP LEARNING CAN PLAY VIDEO GAMES

DEEP LEARNING CAN DIAGNOSE PATIENTS

Has the potential to change many medical specialities

DEEP LEARNING CAN DIAGNOSE PATIENTS

Has the potential to change many medical specialities

DEEP LEARNING CAN MODEL PROTEINS

EVERYONE IS USING DEEP LEARNING

WHAT IS DEEP LEARNING?

WHAT IS A NEURAL NET?

NEURAL NETWORK STRUCTURE

Say we want to build a model to predict the likelihood of a have a heart attack (MI) based on blood pressure (BP) and BMI

NEURAL NET STRUCTURE

A neural net is a modular way to build a classifier

BMI

Inputs

Output

Probability of MI

WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network

Inputs

Output

BMI

Probability of MI

WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network

A neuron does two things, and only two things

BMI

WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network

Weight for

A neuron does two things, and only two things

Weight for

1) Weighted sum of inputs

w_1*BP + w_2*BMI

BMI

WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network

Weight for

A neuron does two things, and only two things

Weight for

1) Weighted sum of inputs

\phi(w_1*BP + w_2*BMI)

2) Nonlinear transformation

BMI

w_1*BP + w_2*BMI

WHAT IS AN ARTIFICIAL NEURON?

\phi()

is known as the activation function

Sigmoid

Hyperbolic Tangent

WHAT IS AN ARTIFICIAL NEURON?

Summary: A neuron produces a single number that is a nonlinear transformation of its input connections

A neuron does two things, and only two things

= a number

BMI

WHAT IS AN ARTIFICIAL NEURON?

Summary: A neuron produces a single number that is a nonlinear transformation of its input connections

A neuron does two things, and only two things

= a number

BMI

This simple formula allows for an amazing amount of expressiveness

NEURAL NETWORK STRUCTURE

BMI

Inputs

Output

Neural nets are organized into layers

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Input Layer

Neural nets are organized into layers

BMI

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Neural nets are organized into layers

1st Hidden Layer

Input Layer

BMI

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Neural nets are organized into layers

A single hidden unit

1st Hidden Layer

Input Layer

BMI

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Input Layer

Neural nets are organized into layers

1st Hidden Layer

A single hidden unit

2nd Hidden Layer

BMI

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Input Layer

Neural nets are organized into layers

1st Hidden Layer

A single hidden unit

2nd Hidden Layer

Output Layer

BMI

Probability of MI

NEURAL NETWORK STRUCTURE

Inputs

Output

Input Layer

1st Hidden Layer

A single hidden unit

2nd Hidden Layer

Output Layer

BMI

Probability of MI

Each layer is a list of numbers called an "embedding"

Finding the best values for the weights = learning

HOW NEURAL NETS LEARN

Weight for

BMI

HOW DID WE GET HERE?

NEURAL NETWORKS ARE AN OLD IDEA

One of the very first ideas in machine learning and artificial intelligence

Date back to 1940s
Many cycles of boom and bust
Repeated promises of "true AI" that were unfulfilled followed by "AI winters"

Are today's neural nets any different than their predecessors?

"[The perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." - Frank Rosenblatt, 1958

IN THE BEGINNING... (1940s-1960s)

Rosenblatt's Perceptron, 1957

Initially very promising
Came with provably correct learning algorithm
Could recognize letters and numbers

THE FIRST AI WINTER (1969)

Minsky and Papert show that the perceptron can't even solve the XOR problem

Kills research on neural nets for the next 15-20 years

THE BACKPROPAGANDISTS EMERGE (1986)

Rumelhart, Hinton, and Willams show us how to train multilayered neural networks

THE BREAKTHROUGH (2012)

Imagenet Database

Millions of labeled images
Objects in images fall into 1 of a possible 1,000 categories
Relatively high-resolution
Bounding boxes giving exact location of object - useful for both classification and localization

Large Scale Visual

Recognition Challenge (ILSVRC)

Annual Imagenet Challenge starting in 2010
Successor to smaller PASCAL VOC challenge
Many tracks including classification and localization
Standardized training and test set. Competitors upload predictions for test set and are automatically scored

THE BREAKTHROUGH (2012)

Pivotal event occurred in the 2012 ILSVRC which brought together 3 critical ingredients:

Massive amounts of labeled images
Training with GPUs
Methodological innovations that enabled training deeper networks while minimizing overfitting

THE BREAKTHROUGH (2012)

In 2011, a misclassification rate of 25% was near state of the art on ILSVRC

In 2012, Geoff Hinton and two graduate students, Alex Krizhevsky and Ilya Sutskever, entered ILSVRC with one of the first deep neural networks trained on GPUs, now known as "Alexnet"

Result: An error rate of 16%, nearly half what the second place entry was able to achieve.

The computer vision world immediately took notice

THE ILSVRC AFTERMATH (2012-2014)

Alexnet paper has ~ 40,000 citations since being published in 2012!

Deep Learning Comes of Age

WHY NOW?

MODERN DEEP LEARNING

Several key advancements have enabled the modern deep learning revolution

Advent of massively parallel computing by GPUs.

MODERN DEEP LEARNING

Several key advancements have enabled the modern deep learning revolution

Advent of massively parallel computing by GPUs.

21 TFLOPs of computing power. Would have been the fastest super computer on Earth around the year 2000!

You can rent one on the Amazon cloud for $3/hour!

Several key advancements have enabled the modern deep learning revolution

Methodological advancements have made deeper networks easier to train

Architecture

Optimizers

Activation Functions

MODERN DEEP LEARNING

Several key advancements have enabled the modern deep learning revolution

Robust frameworks and abstractions make iteration faster and less error prone

Automatic differentiation allows easy prototyping

MODERN DEEP LEARNING

CONVOLUTIONAL NEURAL NETWORKS

CONVOLUTIONAL NEURAL NETS (CNNs)

Dates back to the late 1980s

Invented by in 1989 Yann Lecun at Bell Labs - "Lenet"
Integrated into handwriting recognition systems
in the 90s
Huge flurry of activity after the Alexnet paper

CONVOLUTIONAL NEURAL NETS

Images are just 2D arrays of numbers

Goal is to build f(image) = 1

CONVOLUTIONAL NEURAL NETS

CNNs look at small connected groups of pixels using "filters"

Image credit: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution

Images have a local correlation structure

Near by pixels are likely to be more similar than pixels that are far away

CNNs exploit this through convolutions of small image patches

CONVOLUTIONAL NEURAL NETS

Example convolution

CONVOLUTIONAL NEURAL NETS

Pooling provides spatial invariance

Image credit: http://cs231n.github.io/convolutional-networks/

CONVOLUTIONAL NEURAL NETS

Convolution + pooling + activation = CNN

Image credit: http://cs231n.github.io/convolutional-networks/

CONVOLUTIONAL NEURAL NETS (CNNs)

CNN formula is relatively simple

Image credit: http://cs231n.github.io/convolutional-networks/

CONVOLUTIONAL NEURAL NETS

Data augmentation mimics the image generative process

Image credit: http://slideplayer.com/slide/8370683/

Drastically "expands" training set size
Improves generalization
Works if it doesn't "break" image -> label relationship

CNNS AREN'T MAGIC

Based on solid image priors
Learns a hierarchical set of filters
Exploit properties of images, e.g. local correlations and invariances
Mimic generative distribution with augmentation to reduce over fitting
Results in end-to-end visual recognition system trained with SGD on GPUs: pixels in -> classifications out

CNNs exploit strong prior information about images

SUMMARY & CONCLUSIONS

DEEP LEARNING AND YOU

Barrier to entry for deep learning is actually low

... but a few things might stand in your way:

Need to make sure your problem is a good fit
- Lots of labeled data and appropriate signal/noise ratio
Access to GPUs
Must "speak the language"
- Many design choices and hyper parameter selections
Know how to "babysit" the model during learning phase

DEEP LEARNING AND YOU

Barrier to entry for deep learning is actually low

... but a few things might stand in your way:

Need to make sure your problem is a good fit
- Lots of labeled data and appropriate signal/noise ratio
Access to GPUs
Must "speak the language"
- Many design choices and hyper parameter selections
Know how to "babysit" the model during learning phase

HOW CAN YOU STAY CURRENT?

The field moves fast, staying up to date can be challenging

http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html

CONCLUSIONS

Could potentially impact many fields -> understand concepts so you have deep learning "insurance"
Long history and connections to other models and fields
Prereqs: Data (lots) + GPUs (more = better)
Deep learning models are like legos, but you need to know what blocks you have and how they fit together
Need to have a sense of sensible default parameter values to get started
"Babysitting" the learning process is a skill

Intro to Machine Learning

Review Articles

What is Artificial Intelligence?

What is Machine Learning?

What is Machine Learning?

What is Machine Learning?

What is Machine Learning?

What is Machine Learning?

WHAT IS DEEP LEARNING?

WHY DEEP LEARNING?

DEEP LEARNING HAS MASTERED GO

DEEP LEARNING HAS MASTERED GO

DEEP LEARNING HAS MASTERED GO

DEEP LEARNING HAS MASTERED GO

DEEP LEARNING CAN PLAY VIDEO GAMES

DEEP LEARNING CAN DIAGNOSE PATIENTS

DEEP LEARNING CAN DIAGNOSE PATIENTS

DEEP LEARNING CAN MODEL PROTEINS

EVERYONE IS USING DEEP LEARNING

WHAT IS DEEP LEARNING?

WHAT IS A NEURAL NET?

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NET STRUCTURE

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

WHAT IS AN ARTIFICIAL NEURON?

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

NEURAL NETWORK STRUCTURE

HOW NEURAL NETS LEARN

HOW DID WE GET HERE?

NEURAL NETWORKS ARE AN OLD IDEA

IN THE BEGINNING... (1940s-1960s)

THE FIRST AI WINTER (1969)

THE BACKPROPAGANDISTS EMERGE (1986)

THE BREAKTHROUGH (2012)

THE BREAKTHROUGH (2012)

THE BREAKTHROUGH (2012)

THE ILSVRC AFTERMATH (2012-2014)

Deep Learning Comes of Age

WHY NOW?

MODERN DEEP LEARNING

MODERN DEEP LEARNING

MODERN DEEP LEARNING

MODERN DEEP LEARNING

CONVOLUTIONAL NEURAL NETWORKS

CONVOLUTIONAL NEURAL NETS (CNNs)

CONVOLUTIONAL NEURAL NETS

CONVOLUTIONAL NEURAL NETS

CONVOLUTIONAL NEURAL NETS

CONVOLUTIONAL NEURAL NETS

CONVOLUTIONAL NEURAL NETS

CONVOLUTIONAL NEURAL NETS (CNNs)

CONVOLUTIONAL NEURAL NETS

CNNS AREN'T MAGIC

SUMMARY & CONCLUSIONS

DEEP LEARNING AND YOU

DEEP LEARNING AND YOU

HOW CAN YOU STAY CURRENT?

CONCLUSIONS

Intro to ML - Flagship Fellows

More from beamandrew