Andrew Beam, PhD
Instructor, Department of Biomedical Informatics
Harvard Medical School
July 13th, 2017
twitter: @AndrewLBeam
Deep Learning 101 companion series of blog posts:
http://beamandrew.github.io
Jupyter Notebooks:
https://github.com/beamandrew/deeplearning_101
One of the very first ideas in machine learning and artificial intelligence
Are today's neural nets any different than their predecessors?
"[The perceptron is] the embryo of an electronic computer that [the Navy] expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence." - Frank Rosenblatt, 1958
Please read Schmidhuber's perspective on the history of deep learning: http://www.scholarpedia.org/article/Deep_Learning
Warren McCulloch and Walter Pitts (1943)
Thresholded logic unit with hardcoded weights
Intended to mimic "integrate and fire" model of neurons
Rosenblatt's Perceptron, 1957
Minsky and Papert show that the perceptron can't even solve the XOR problem
Kills research on neural nets for the next 15-20 years
Rumelhart, Hinton, and Willams show us how to train multilayered neural networks
Neural Network for Computer Vision
Unsupervised pre-training of "deep belief nets" allowed for large and deeper models
Image credit: https://www.toptal.com/machine-learning/an-introduction-to-deep-learning-from-perceptrons-to-deep-networks
Imagenet Database
Large Scale Visual
Recognition Challenge (ILSVRC)
Pivotal event occurred in the 2012 ILSVRC which brought together 3 critical ingredients:
In 2011, a misclassification rate of 25% was near state of the art on ILSVRC
In 2012, Geoff Hinton and two graduate students, Alex Krizhevsky and Ilya Sutskever, entered ILSVRC with one of the first deep neural networks trained on GPUs, now known as "Alexnet"
Result: An error rate of 16%, nearly half what the second place entry was able to achieve.
The computer vision world immediately took notice
Alexnet paper has ~ 13,000 citations since being published in 2012!
Several key advancements has enabled the modern deep learning revolution
Several key advancements have enabled the modern deep learning revolution
Availability of massive datasets
with high-quality labels
Standardized benchmarks of progress and open source tools
Community acknowledgment that
open data -> everyone gets better
Several key advancements have enabled the modern deep learning revolution
Advent of massively parallel computing by GPUs. Enabled training of huge neural nets on extremely large datasets
Several key advancements have enabled the modern deep learning revolution
Methodological advancements have made deeper networks easier to train
Architecture
Optimizers
Activation Functions
Several key advancements have enabled the modern deep learning revolution
Robust frameworks and abstractions make iteration faster and less error prone
Automatic differentiation allows easy prototyping
This all leads to the following hypothesis
Deep Learning Hypothesis: The success of deep learning is largely a success of engineering.
Personal belief: Things are different with neural nets this time around
Fitting Analogy?
These advancements have been transferred to other fields
Doctors were crucial... in creating the labeled dataset!
These advancements have been transferred to other fields
Off the shelf, pre-trained deep neural network + 130,000 images = expert level diagnostic accuracy
Not just medicine, but genomics too
More here: https://github.com/gokceneraslan/awesome-deepbio
The field moves fast, staying up to date can be challenging
http://beamandrew.github.io/deeplearning/2017/02/23/deep_learning_101_part1.html
Barrier to entry for deep learning is actually low
... but a few things might stand in your way:
Pretend we just have two variables:
... and a class label:
Pretend we just have two variables:
... and a class label:
... we construct a function to predict y from x:
Pretend we just have two variables:
... and a class label:
... we construct a function to predict y from x:
... and turn this into a probability using the logistic function:
Pretend we just have two variables:
... and a class label:
... we construct a function to predict y from x:
... and turn this into a probability using the logistic function:
... and use Bernoulli negative log-likelihood as loss:
Pretend we just have two variables:
... and a class label:
... we construct a function to predict y from x:
... and turn this into a probability using the logistic function:
This is good old-fashioned logistic regression
... and use Bernoulli negative log-likelihood as loss:
How do we learn the "best" values for ?
How do we learn the "best" values for ?
Gradient Decscent
How do we learn the "best" values for ?
This in essence is the entire "learning" algorithm
behind modern deep learning. Keep this in mind.
Gradient Decscent
With a small change, we can turn our logistic regression model into a neural net
MLPs learn a set of nonlinear features directly from data
"Feature learning" is the hallmark of deep learning approachs
Can add more layers to increase capacity of network
from keras.layers import Input, Dense, Dropout
from keras.models import Sequential
mlp = Sequential()
mlp.add(Dense(output_dim=128, input_dim=num_variables, activation='relu'))
mlp.add(Dropout(0.5))
mlp.add(Dense(output_dim=1, activation='sigmoid'))
mlp.compile(optimizer='adam', loss='binary_crossentropy',metrics=['accuracy'])
mlp.fit(X_train, y_train,validation_data=[X_val,y_val],nb_epoch=10,verbose=1)
See this notebook for more details: https://github.com/beamandrew/deeplearning_101/blob/master/mlp_tutorial.ipynb
Tensorflow backend gives gradients, training procedure, GPU computation "for free"!
CNNs are the workhorse model in image recognition
Images are just 2D arrays of numbers
Goal is to build f(image) = 1
CNNs exploit strong prior information about images
If we just "flatten" the image into a vector, we throw away a ton of information
CNNs use the structural properties of images to improve performance.
CNNs exploit strong prior information about images
What's a convolution?
In images, it just means that we're doing a dot-product over a small image patch
Example convolution
CNNs look at small connected groups of pixels using "filters"
Image credit: http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
Images have a local correlation structure
Near by pixels are likely to be more similar than pixels that are far away
CNNs exploit this through convolutions of small image patches
Pooling provides spatial invariance
Image credit: http://cs231n.github.io/convolutional-networks/
Convolution + activation + pooling = CNN
Image credit: http://cs231n.github.io/convolutional-networks/
CNN formula is relatively simple
Image credit: http://cs231n.github.io/convolutional-networks/
Data augmentation mimics the image generative process
Image credit: http://slideplayer.com/slide/8370683/
cnn = Sequential()
cnn.add(Convolution2D(nb_filter=32,nb_row=5,nb_col=5,activation='relu'))
cnn.add(Convolution2D(nb_filter=32,nb_row=5,nb_col=5,activation='relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Convolution2D(nb_filter=32,nb_row=5,nb_col=5,activation='relu'))
cnn.add(Convolution2D(nb_filter=32,nb_row=5,nb_col=5,activation='relu'))
cnn.add(MaxPooling2D(pool_size=(2, 2)))
cnn.add(Flatten())
cnn.add(Dense(output_dim=1024,activation='relu'))
cnn.add(Dense(output_dim=10,activation='softmax'))
cnn.compile(loss='categorical_crossentropy',optimizer='adam')
cnn.fit(X_train,Y_train,batch_size=64)
Again, tensorflow backend gives gradients, training procedure, GPU computation "for free"!
CNNs can be tricked in strange ways
Image credit: https://openai.com/blog/adversarial-example-research/
CNNs exploit strong prior information about images