Introduction to RNN and LSTM

Shubham Dokania

@shubhamdokania

Common Architectures

Neural Models:
- Artificial Neural Network
- Convolutional Neural Network
Random Forests/Decision Trees
Naive bayes (Bayesian methods)
And more...

DATA: Sequences

Examples:
- Stock market data
- Text (Book, dialogues corpus)
- Music
- Handwriting/Art strokes etc.

Problem?

Can't use previous models on sequences.
No context information.
No time-dependency.
Sequences are temporal in nature.

Any Solutions?

Convert Temporal data to Spatial.
Markov model: model the data as probabilistic model using markov chains.

Problem in mc models

Markov Chains (MC) models have fixed-sized context (order of the chain)

Fixed-size context == no context preservation for long term contextual information.

Memory (context) issue

Solution?

enter recurrent architectures

recurrent neural network

Recurrent nodes

Contain Feedback loops.
Analogous to sequential digital circuits.
Propagate information through time.

unroll in time

Information is passed via hidden states through time.

Working of rnn

issues with rnn

and improvements

vanishing gradients

Occurs due to backpropagation through time.
As sequence length increases, the gradient value diminishes or explodes!
Happens due to consecutive differential matrices in chain rule!

fixing vanishing gradients

Better Initialization of W to Identity.
Application of ReLu as activation.

context (long-term) loss

Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.

Can be improved by increasing hidden state size, but there's a limit to space complexity!

SOLUTION?

LSTMs

lstm

long short-term memory

lstm node

Specifically designed to overcome the long-term dependency problem.
Has shown tremendous performance improvements!

Notation

core improvements

The Context vector along with the hidden state.
Use of Addition gate and Forget gate!