Introduction to RNN and LSTM

Shubham Dokania

@shubhamdokania

Common Architectures

  • Neural Models:
    • Artificial Neural Network
    • Convolutional Neural Network
  • Random Forests/Decision Trees
  • Naive bayes (Bayesian methods)
  • And more...

DATA: Sequences

  • Examples:
    • Stock market data
    • Text (Book, dialogues corpus)
    • Music
    • Handwriting/Art strokes etc.

Problem?

  • Can't use previous models on sequences.
  • No context information.
  • No time-dependency.
  • Sequences are temporal in nature.

Any Solutions?

  • Convert Temporal data to Spatial.
  • Markov model: model the data as probabilistic model using markov chains.

 

 

 

Problem in mc models

Markov Chains (MC) models have fixed-sized context (order of the chain)

 

Fixed-size context == no context preservation for long term contextual information.

Memory (context) issue

Solution?

enter recurrent architectures 

recurrent neural network

Recurrent nodes

  • Contain Feedback loops.
  • Analogous to sequential digital circuits.
  • Propagate information through time.

unroll in time

Information is passed via hidden states through time.

Working of rnn

issues with rnn

and improvements

vanishing gradients

  • Occurs due to backpropagation through time.
  • As sequence length increases, the gradient value diminishes or explodes!
  • Happens due to consecutive differential matrices in chain rule!

fixing vanishing gradients

  • Better Initialization of W to Identity.
  • Application of ReLu as activation.

context (long-term) loss

Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.

Can be improved by increasing hidden state size, but there's a limit to space complexity!

 

SOLUTION?

LSTMs

lstm

long short-term memory

lstm node

  • Specifically designed to overcome the long-term dependency problem.
  • Has shown tremendous performance improvements!

Notation

core improvements

  • The Context vector along with the hidden state.
  • Use of Addition gate and Forget gate!

lstm forget gate

lstm addition gate

lstm context update

lstm output state

some examples

jupyter notebooks

Made with Slides.com