Introduction to RNN and LSTM
Shubham Dokania
@shubhamdokania
Common Architectures
- Neural Models:
- Artificial Neural Network
- Convolutional Neural Network
- Random Forests/Decision Trees
- Naive bayes (Bayesian methods)
- And more...
DATA: Sequences
- Examples:
- Stock market data
- Text (Book, dialogues corpus)
- Music
- Handwriting/Art strokes etc.
Problem?
- Can't use previous models on sequences.
- No context information.
- No time-dependency.
- Sequences are temporal in nature.
Any Solutions?
- Convert Temporal data to Spatial.
- Markov model: model the data as probabilistic model using markov chains.
Problem in mc models
Markov Chains (MC) models have fixed-sized context (order of the chain)
Fixed-size context == no context preservation for long term contextual information.
Memory (context) issue
Solution?
enter recurrent architectures
recurrent neural network
Recurrent nodes
- Contain Feedback loops.
- Analogous to sequential digital circuits.
- Propagate information through time.
unroll in time
Information is passed via hidden states through time.
Working of rnn
issues with rnn
and improvements
vanishing gradients
- Occurs due to backpropagation through time.
- As sequence length increases, the gradient value diminishes or explodes!
- Happens due to consecutive differential matrices in chain rule!
fixing vanishing gradients
- Better Initialization of W to Identity.
- Application of ReLu as activation.
context (long-term) loss
Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.
Can be improved by increasing hidden state size, but there's a limit to space complexity!
SOLUTION?
LSTMs
lstm
long short-term memory
lstm node
- Specifically designed to overcome the long-term dependency problem.
- Has shown tremendous performance improvements!
Notation
core improvements
- The Context vector along with the hidden state.
- Use of Addition gate and Forget gate!
lstm forget gate
lstm addition gate
lstm context update
lstm output state
some examples
jupyter notebooks
RNN and LSTMs: Introduction
By Shubham Dokania
RNN and LSTMs: Introduction
A presentation for CoSysLab@IIITD
- 1,188