Andrew Beam, PhD
Head of Machine Learning/Senior Fellow
Flagship Pioneering
October 24th, 2018
twitter: @AndrewLBeam
Source: https://deepmind.com/blog/alphago-zero-learning-scratch/
Human data no longer needed
Has the potential to change many medical specialities
Has the potential to change many medical specialities
What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability
What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability
Image credit: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and irritability
Not just medicine, but genomics too
More here: https://github.com/gokceneraslan/awesome-deepbio
Deep learning is a specific kind of machine learning
- Machine learning automatically learns relationships using data
- Deep learning refers to large neural networks
- These neural networks have millions of parameters and hundreds of layers (e.g. they are structurally deep)
- Most important: Deep learning is not magic!
Say we want to build a model to predict the likelihood of a have a heart attack (MI) based on blood pressure (BP) and BMI
A neural net is a modular way to build a classifier
Inputs
Output
Probability of MI
The neuron is the basic functional unit a neural network
Inputs
Output
Probability of MI
The neuron is the basic functional unit a neural network
A neuron does two things, and only two things
The neuron is the basic functional unit a neural network
Weight for
A neuron does two things, and only two things
Weight for
1) Weighted sum of inputs
The neuron is the basic functional unit a neural network
Weight for
A neuron does two things, and only two things
Weight for
1) Weighted sum of inputs
2) Nonlinear transformation
is known as the activation function
Sigmoid
Hyperbolic Tangent
Summary: A neuron produces a single number that is a nonlinear transformation of its input connections
A neuron does two things, and only two things
= a number
Summary: A neuron produces a single number that is a nonlinear transformation of its input connections
A neuron does two things, and only two things
= a number
This simple formula allows for an amazing amount of expressiveness
Inputs
Output
Neural nets are organized into layers
Probability of MI
Inputs
Output
Input Layer
Neural nets are organized into layers
Probability of MI
Inputs
Output
Neural nets are organized into layers
1st Hidden Layer
Input Layer
Probability of MI
Inputs
Output
Neural nets are organized into layers
A single hidden unit
1st Hidden Layer
Input Layer
Probability of MI
Inputs
Output
Input Layer
Neural nets are organized into layers
1st Hidden Layer
A single hidden unit
2nd Hidden Layer
Probability of MI
Inputs
Output
Input Layer
Neural nets are organized into layers
1st Hidden Layer
A single hidden unit
2nd Hidden Layer
Output Layer
Probability of MI
Finding the best values for the weights = learning
Weight for
Weight for
Go to: https://playground.tensorflow.org
One of the very first ideas in machine learning and artificial intelligence
Are today's neural nets any different than their predecessors?
"[The perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." - Frank Rosenblatt, 1958
Rosenblatt's Perceptron, 1957
Minsky and Papert show that the perceptron can't even solve the XOR problem
Kills research on neural nets for the next 15-20 years
Rumelhart, Hinton, and Willams show us how to train multilayered neural networks
Unsupervised pre-training of "deep belief nets" allowed for large and deeper models
Image credit: https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
Imagenet Database
Large Scale Visual
Recognition Challenge (ILSVRC)
Pivotal event occurred in the 2012 ILSVRC which brought together 3 critical ingredients:
In 2011, a misclassification rate of 25% was near state of the art on ILSVRC
In 2012, Geoff Hinton and two graduate students, Alex Krizhevsky and Ilya Sutskever, entered ILSVRC with one of the first deep neural networks trained on GPUs, now known as "Alexnet"
Result: An error rate of 16%, nearly half what the second place entry was able to achieve.
The computer vision world immediately took notice
Alexnet paper has ~ 30,000 citations since being published in 2012!
Several key advancements have enabled the modern deep learning revolution
Advent of massively parallel computing by GPUs.
Several key advancements have enabled the modern deep learning revolution
Advent of massively parallel computing by GPUs.
21 TFLOPs of computing power. Would have been the fastest super computer on Earth around the year 2000!
You can rent one on the Amazon cloud for $3/hour!
Several key advancements have enabled the modern deep learning revolution
Methodological advancements have made deeper networks easier to train
Architecture
Optimizers
Activation Functions
Several key advancements have enabled the modern deep learning revolution
Robust frameworks and abstractions make iteration faster and less error prone
Automatic differentiation allows easy prototyping
+
Go to: https://github.com/beamandrew/HSPH_lecture
Dates back to the late 1980s
Images are just 2D arrays of numbers
Goal is to build f(image) = 1
CNNs look at small connected groups of pixels using "filters"
Image credit: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Images have a local correlation structure
Near by pixels are likely to be more similar than pixels that are far away
CNNs exploit this through convolutions of small image patches
Example convolution
Pooling provides spatial invariance
Image credit: http://cs231n.github.io/convolutional-networks/
Convolution + pooling + activation = CNN
Image credit: http://cs231n.github.io/convolutional-networks/
CNN formula is relatively simple
Image credit: http://cs231n.github.io/convolutional-networks/
Data augmentation mimics the image generative process
Image credit: http://slideplayer.com/slide/8370683/
CNNs exploit strong prior information about images
Barrier to entry for deep learning is actually low
... but a few things might stand in your way:
Barrier to entry for deep learning is actually low
... but a few things might stand in your way:
The field moves fast, staying up to date can be challenging
http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html