Fernanda Mora
ITAM
DL is hot:
Facebook, Google and others
The price?
1 billion neural connections, 16,000 computers over 3 days in youtube
Drug discovery: predict the biomolecular target of a compound/toxic effects
Customer relationship management: approximate the value of possible direct marketing actions over the customer state space
Energy forecasting: predict the demand of electric energy
What is deep learning?
Non-technical definition:
"Deep learning is a framework for training and modelling neural networks which recently have surpassed all conventional methods in many learning tasks, prominently image and voice recognition"
Batres-Estrada, 2015
Mid-technical definition:
"Set of algorithms that attempt to model high-level abstractions in data by using architectures of multiple non-linear transformations"
Is DL that new?
Single neuron training
Neural network training
(1-2 hidden layers)
2006
Very deep networks training
Cybernetics
Connectionism
Deep learning
Deep motivations:
AI, Machine and representation learning
Example of a Support Vector Machine: separate data with hyperplane
H3 seems to be the best classifier. Why?
Is the algorithm useful?
Can it be implemented? Does it run fast?
Does it learn?
Does it converge?
Training error
Validation error
Number of training circles
Overfitting with a neural network
Training error is optimistic!
Example
Kernel machines are used to compute a non-linearly separable function into a higher dimension separable function.
The kernel finds the maximum-margin hyperplane in a transformed feature space.
Then why do we need them if PCA does the job??
Learning is involved
Is DL supervised or not supervised?
Deep Learning is compatible with supervised, unsupervised and feature learning.
In fact, a normal path when implementing a DL algorithm is:
Deep architectures: how and when deep is good
No information
Noisy information, not useful
Useful information
but RAW
1 layer neuron won't be able to extract the features!
Output (object identity)
3rd layer (object parts)
2nd layer
(corners and contours)
1st layer (edges)
Input layer (pixels)
Person: backpropagation algorithm
Path: weights to minimize error
Steepness: slope of the mountain (differentiation)
Direction to go: Gradient of the error surface
Frequency of measurement: learning rate
Calculates the gradient of a loss function with respect to all the weights in the network
Find a way to train a multi-layered neural network such that it can learn the appropriate internal representations to allow it to learn any arbitrary mapping of input to output
Output: optimized weights
Min choosing weights
DBNs can be viewed as a composition of simple, unsupervised networks such as restricted Boltzmann machines (RBMs) or autoencoders (previously seen), where each sub-network's hidden layer serves as the visible layer for the next.
The training algorithm for DBNs proceeds as follows. Let X be a matrix of inputs, regarded as a set of feature vectors.
Deep neural networks
Deep belief networks
Convolutional deep belief networks
Deep Boltzmann machines
Deep k-networks
+
+
+
Adam Gibson, creator of Deeplearning4j, a commercial-grade, open-source deep-learning library intended to be used in commercial settings:
"My advice is to use it for media (image, video, audio, text) and for time-series analysis (e.g., sensor data feeds)"
Adam Gibson, on a conference about Deep Learning that took place in 2014:
One person in the audience asked about using it for fraud detection, Adam literally said something like, "I don't even want to see you here."
Available DL
libraries
Examples and code
Interaction between instances of Apply (blue), Variable (red), Op (green), and Type (purple).
Theano identifies and replaces certain patterns in the graph with other specialized patterns that produce the same results but are either faster or more stable
Complete tutorial for Theano: link
Downside: learning curve :(
Starting in Theano examples (very basic)
Paper: Ruslan Salakhutdinov, Hugo Larochelle ; Efficient Learning of Deep Boltzmann Machines.
NORB data set:
5 object categories, 5 difference objects within each category.
Classification error rate
Deep Boltzmann Machines : 10.8 %
Support vector machines : 11.6 %
Logistic Regression : 22.5 %
K-Nearest Neighbors : 18.4 %
Exploring Strategies for Training Deep Neural Networks.
Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, Pascal Lamblin; 10(Jan):1--40, 2009.
MNIST data set
Deep takeaways
Reading list
Links to software
Datasets
Research groups and labs
Job listings
Tutorials and demos
Thank you!