Introduction to RNN and LSTM
- Neural Models:
- Artificial Neural Network
- Convolutional Neural Network
- Random Forests/Decision Trees
- Naive bayes (Bayesian methods)
- And more...
- Stock market data
- Text (Book, dialogues corpus)
- Handwriting/Art strokes etc.
- Can't use previous models on sequences.
- No context information.
- No time-dependency.
- Sequences are temporal in nature.
- Convert Temporal data to Spatial.
- Markov model: model the data as probabilistic model using markov chains.
Problem in mc models
Markov Chains (MC) models have fixed-sized context (order of the chain)
Fixed-size context == no context preservation for long term contextual information.
Memory (context) issue
enter recurrent architectures
recurrent neural network
- Contain Feedback loops.
- Analogous to sequential digital circuits.
- Propagate information through time.
unroll in time
Information is passed via hidden states through time.
Working of rnn
issues with rnn
- Occurs due to backpropagation through time.
- As sequence length increases, the gradient value diminishes or explodes!
- Happens due to consecutive differential matrices in chain rule!
fixing vanishing gradients
- Better Initialization of W to Identity.
- Application of ReLu as activation.
context (long-term) loss
Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.
Can be improved by increasing hidden state size, but there's a limit to space complexity!
long short-term memory
- Specifically designed to overcome the long-term dependency problem.
- Has shown tremendous performance improvements!
- The Context vector along with the hidden state.
- Use of Addition gate and Forget gate!
lstm forget gate
lstm addition gate
lstm context update
lstm output state
RNN and LSTMs: Introduction
By Shubham Dokania