Shubham Dokania

@shubhamdokania

- Neural Models:
- Artificial Neural Network
- Convolutional Neural Network

- Random Forests/Decision Trees
- Naive bayes (Bayesian methods)
- And more...

- Examples:
- Stock market data
- Text (Book, dialogues corpus)
- Music
- Handwriting/Art strokes etc.

- Can't use previous models on sequences.
- No context information.
- No time-dependency.
- Sequences are temporal in nature.

- Convert Temporal data to Spatial.
- Markov model: model the data as probabilistic model using markov chains.

Markov Chains (MC) models have fixed-sized context (order of the chain)

Fixed-size context == no context preservation for long term contextual information.

- Contain Feedback loops.
- Analogous to sequential digital circuits.
- Propagate information through time.

Information is passed via hidden states through time.

- Occurs due to backpropagation through time.
- As sequence length increases, the gradient value diminishes or explodes!
- Happens due to consecutive differential matrices in chain rule!

- Better Initialization of W to Identity.
- Application of ReLu as activation.

Although theoretically possible, RNNs are very bad at capturing long-term dependencies in data.

Can be improved by increasing hidden state size, but there's a limit to space complexity!

SOLUTION?

**LSTMs**

- Specifically designed to overcome the long-term dependency problem.
- Has shown tremendous performance improvements!

- The Context vector along with the hidden state.
- Use of
**Addition**gate and**Forget**gate!