# Introduction to RNN and LSTM

Shubham Dokania

@shubhamdokania

## Common Architectures

- Neural Models:
- Artificial Neural Network
- Convolutional Neural Network

- Random Forests/Decision Trees
- Naive bayes (Bayesian methods)
- And more...

## DATA: Sequences

- Examples:
- Stock market data
- Text (Book, dialogues corpus)
- Music
- Handwriting/Art strokes etc.

## Problem?

- Can't use previous models on sequences.
- No context information.
- No time-dependency.
- Sequences are temporal in nature.

## Any Solutions?

- Convert Temporal data to Spatial.
- Markov model: model the data as probabilistic model using markov chains.

## Problem in mc models

Markov Chains (MC) models have fixed-sized context (order of the chain)

Fixed-size context == no context preservation for long term contextual information.

## Memory (context) issue

## Solution?

### enter recurrent architectures

# recurrent neural network

## Recurrent nodes

- Contain Feedback loops.
- Analogous to sequential digital circuits.
- Propagate information through time.

## unroll in time

Information is passed via hidden states through time.

## Working of rnn

# issues with rnn

## and improvements

## vanishing gradients

- Occurs due to backpropagation through time.
- As sequence length increases, the gradient value diminishes or explodes!
- Happens due to consecutive differential matrices in chain rule!

## fixing vanishing gradients

- Better Initialization of W to Identity.
- Application of ReLu as activation.

## context (long-term) loss

Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.

Can be improved by increasing hidden state size, but there's a limit to space complexity!

SOLUTION?

**LSTMs**

# lstm

## long short-term memory

## lstm node

- Specifically designed to overcome the long-term dependency problem.
- Has shown tremendous performance improvements!

## Notation

## core improvements

- The Context vector along with the hidden state.
- Use of
**Addition**gate and**Forget**gate!

## lstm forget gate

## lstm addition gate

## lstm context update

## lstm output state

# some examples

## jupyter notebooks

#### RNN and LSTMs: Introduction

By Shubham Dokania

# RNN and LSTMs: Introduction

A presentation for CoSysLab@IIITD

- 1,130