DEEP LEARNING FOR MEDICAL DATA
Andrew Beam, PhD
Department of Biomedical Informatics
Harvard School of Public Health
October 25th, 2017

twitter: @AndrewLBeam
TALK OUTLINE

- Why deep learning?
- What is deep learning?
- What is a neural network?
- When does deep learning work well?
- Deep learning case study
WHY DEEP LEARNING?
DEEP LEARNING HAS MASTERED GO





DEEP LEARNING HAS MASTERED GO


Source: https://deepmind.com/blog/alphago-zero-learning-scratch/

Human data no longer needed
DEEP LEARNING HAS MASTERED GO


DEEP LEARNING HAS MASTERED GO



DEEP LEARNING CAN DIAGNOSE PATIENTS





Medical imaging is being revolutionized
DEEP LEARNING FOR TB DRUG RESISTANCE



Inferring drug-resistance status in tuberculosis from sequence data using deep learning
Joint work with: Michael Chen, Maha Farat
Highly accurate prediction of resistance to 11 drugs from genetic sequence
DEEP LEARNING CAN DIAGNOSE PATIENTS

What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability
DEEP LEARNING CAN DIAGNOSE PATIENTS

What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability

Image credit: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
DEEP LEARNING CAN DIAGNOSE PATIENTS


What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability
DEEP LEARNING CAN DIAGNOSE PATIENTS


What does this patient have?
A six-year old boy has a high fever that has lasted for three days. He has extremely red eyes and a rash on the main part of his body in addition to a swollen and red strawberry tongue. Remaining symptoms include swollen lymph nodes in the neck and Irritability
We are evaluating this system on questions from the US medical licensing exam!
EVERYONE IS USING DEEP LEARNING



Essentially the same model is behind ALL of these examples!
WHAT IS DEEP LEARNING?

WHAT IS DEEP LEARNING?

Deep learning is a specific kind of machine learning
- Machine learning automatically learns relationships using data
- Deep learning refers to large neural networks
- These neural networks have millions of parameters and hundreds of layers (e.g. they are structurally deep)
- Most important: Deep learning is not magic!
WHAT IS A NEURAL NET?

NEURAL NETWORK STRUCTURE


NEURAL NETWORK STRUCTURE

Say we want to build a model to predict the likelihood of a have a heart attack (MI) based on blood pressure (BP) and BMI
NEURAL NET STRUCTURE

A neural net is a modular way to build a classifier
Inputs
Output
Probability of MI
WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network
Inputs
Output
Probability of MI
WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network
A neuron does two things, and only two things
WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network
Weight for
A neuron does two things, and only two things
Weight for
1) Weighted sum of inputs
WHAT IS AN ARTIFICIAL NEURON?

The neuron is the basic functional unit a neural network
Weight for
A neuron does two things, and only two things
Weight for
1) Weighted sum of inputs
2) Nonlinear transformation
WHAT IS AN ARTIFICIAL NEURON?


is known as the activation function
Sigmoid


Hyperbolic Tangent

WHAT IS AN ARTIFICIAL NEURON?

Summary: A neuron produces a single number that is a nonlinear transformation of its input connections
A neuron does two things, and only two things
= a number
WHAT IS AN ARTIFICIAL NEURON?

Summary: A neuron produces a single number that is a nonlinear transformation of its input connections
A neuron does two things, and only two things
= a number
This simple formula allows for an amazing amount of expressiveness
NEURAL NETWORK STRUCTURE

Inputs
Output
Neural nets are organized into layers
Probability of MI
NEURAL NETWORK STRUCTURE

Inputs
Output
Input Layer
Neural nets are organized into layers
Probability of MI
NEURAL NETWORK STRUCTURE

Inputs
Output
Neural nets are organized into layers
1st Hidden Layer
Input Layer
Probability of MI
NEURAL NETWORK STRUCTURE

Inputs
Output
Neural nets are organized into layers
A single hidden unit
1st Hidden Layer
Input Layer
Probability of MI
NEURAL NETWORK STRUCTURE

Inputs
Output
Input Layer
Neural nets are organized into layers
1st Hidden Layer
A single hidden unit
2nd Hidden Layer
Probability of MI
NEURAL NETWORK STRUCTURE

Inputs
Output
Input Layer
Neural nets are organized into layers
1st Hidden Layer
A single hidden unit
2nd Hidden Layer
Output Layer
Probability of MI

Finding the best values for the weights = learning

HOW NEURAL NETS LEARN
Weight for
Weight for
MODERN NEURAL NETS

MODERN DEEP LEARNING

Neural networks are one of the oldest ideas in machine learning and AI
- Date back to 1940s
- Long history of "hype" cycles - boom and bust
- Were *not* state of the are machine learning technique for most of their existence
- Why are they popular now?
MODERN DEEP LEARNING
Several key advancements have enabled the modern deep learning revolution
GPU enable training of huge neural nets on extremely large datasets




Several key advancements have enabled the modern deep learning revolution

MODERN DEEP LEARNING
Transfer Learning

Train big model
on large dataset
Refine model
on smaller dataset

Several key advancements have enabled the modern deep learning revolution
Methodological advancements have made deeper networks easier to train

Architecture

Optimizers

Activation Functions

MODERN DEEP LEARNING
Several key advancements have enabled the modern deep learning revolution
Easy to use frameworks dramatically lower the barrier to entry
Automatic differentiation allows easy prototyping



MODERN DEEP LEARNING

+
CONVOLUTIONAL NEURAL NETWORKS

CONVOLUTIONAL NEURAL NETS (CNNs)

Dates back to the late 1980s
- Responsible for Deep Learning revival
- Invented by in 1989 Yann Lecun at Bell Labs - "Lenet"
- Integrated into handwriting recognition systems
in the 90s - Huge flurry of activity after the Alexnet paper


THE BREAKTHROUGH (2012)

Imagenet Database
- Millions of labeled images
- Objects in images fall into 1 of a possible 1,000 categories
- Relatively high-resolution
- Bounding boxes giving exact location of object - useful for both classification and localization
Large Scale Visual
Recognition Challenge (ILSVRC)
- Annual Imagenet Challenge starting in 2010
- Successor to smaller PASCAL VOC challenge
- Many tracks including classification and localization
- Standardized training and test set. Competitors upload predictions for test set and are automatically scored
THE BREAKTHROUGH (2012)

Pivotal event occurred in 2012 which laid the blueprint for successful deep learning model
- Massive amounts of labeled images
- Training with GPUs
- Methodological innovations that enabled training deeper networks while minimizing overfitting
THE BREAKTHROUGH (2012)

In 2011, a misclassification rate of 25% was near state of the art on ILSVRC
In 2012, Geoff Hinton and two graduate students, Alex Krizhevsky and Ilya Sutskever, entered ILSVRC with one of the first deep neural networks trained on GPUs, now known as "Alexnet"
Result: An error rate of 16%, nearly half what the second place entry was able to achieve.
The computer vision world immediately took notice
THE ILSVRC AFTERMATH (2012-2014)


Alexnet paper has ~ 16,000 citations since being published in 2012!
Most algorithms expect "tabular" data

WHY CNNS WORK
y | X1 | X2 | X3 | X4 |
---|---|---|---|---|
0 | 7 | 52 | 17 | 654 |
0 | 23 | 2752 | 4 | 1 |
1 | 786 | 27 | 0 | 5 |
0 | 354 | 7527 | 89 | 68 |
The problem with tabular data


What is this a picture of?
WHY CNNS WORK

What is this a picture of?

The problem with tabular data
WHY CNNS WORK

What is this a picture of?

Tabular data throws away too much information!
The problem with tabular data
WHY CNNS WORK
CONVOLUTIONAL NEURAL NETS

Images are just 2D arrays of numbers

Goal is to build f(image) = 1
CONVOLUTIONAL NEURAL NETS

CNNs look at small connected groups of pixels using "filters"

Image credit: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Images have a local correlation structure
Near by pixels are likely to be more similar than pixels that are far away
CNNs exploit this through convolutions of small image patches
WHY DO CNNs WORK SO WELL?

- Based on solid image priors
- Exploit properties of images, e.g. local correlations and invariances
- Mimic generative distribution with augmentation to reduce over fitting
- Results in end-to-end visual recognition system trained with SGD on GPUs: pixels in -> classifications out
CNNs exploit strong prior information about images
CASE STUDY

DIABETIC RETINOPATHY

In 2016 Google built a deep learning model to automatically diagnose patients with diabetic retinopathy
- Input data: Labeled fundus photographs of the eye
- Large database of 128,000 images they created for this project
- Hired ophthalmologists to annotate the images ($$$)
DIABETIC RETINOPATHY


DIABETIC RETINOPATHY

Why did this work so well?
- Huge dataset of over 100,000 images
- High quality annotations - each image was rated by 3-7 opthamologists
- Transfer learning - neural network was originally trained on Imagenet!
- For the cost of a GPU (~$1,000) it's possible to read 240 million images/day at accuracy on par with best ophthalmologists!
DIABETIC RETINOPATHY

Implications
- Many subsequent studies have followed this formula
- Ingredients: Deep learning + high quality database of ~100,000 medical images + transfer learning
- Many medical imaging tasks in radiology, pathology, dermatology, and opthamology can be fully automated in a similar manner
- Similar results emerging from non-image data
How will this technology change medical practice, reimbursement, and other policies?
AUTOMATIC X-RAY READING


https://arxiv.org/pdf/1711.05225.pdf
SUMMARY & CONCLUSIONS

CONCLUSIONS

- Deep learning models are like legos, but you need to know what blocks you have and how they fit together
- Could potentially impact many fields -> understand concepts so you have deep learning "insurance"
- Impact of medical imaging seems inevitable
- Prereqs: Data (lots) + GPUs (more = better)
Additional Resources

http://beamandrew.github.io

C-CSRT Lecture
By beamandrew
C-CSRT Lecture
- 1,706