Recurrent Neural Networks

Sequence Processing Tasks

  • Sequence tagging 
    • POS tagging
    • Named Entity Recognition
    • Sentence Compression
  • Sequence Transduction
    • Machine Translation
    • Speech Recognition
    • Dialogue

Recurrent Neural Networks

\( x_1\)

\( x_2\)

\( x_3\)

\( x_4\)

\( x_5\)

\( U \)

\( U \)

\( U \)

\( U \)

\( U \)

\( W \)

\( W \)

\( W \)

\( W \)

\( V \)

\( V \)

\( V \)

\( V \)

\( V \)

\( s_1 \)

\( s_2 \)

\( s_3 \)

\( s_4 \)

\( s_5 \)

\( \hat{y}_1\)

\( \hat{y}_2\)

\( \hat{y}_3\)

\( \hat{y}_4\)

\( \hat{y}_5\)

  • \( s_i = \sigma (Ux_i + Ws_{i-1} + b)\)
  • \( \hat{y}_i = o(Vs_i +c) \)

\( s_0\)

\( W \)

Find                 a               cheap           Chinese         restaurant

\( s_i = RNN (s_{i-1},x_i)\)

  VB                 DT                  JJ                      JJ                 NN

Recurrent Neural Networks

\( x_i\)

\( U \)

\( s_t \)

\( \hat{y}_i\)

\( s_{t-1}\)

\( W \)

\( V \in \mathbb{R}^{36 \times d}\)

  • \( y_i = softmax(Vs_i +c) \)
  • \( softmax(a)_i = \frac{e^{a_i}}{\sum_j e^{a_j}}\)
  • \( \mathscr{L}_i\) = -\( \sum_c y_{ci} log \hat{y}_{ci}\)

HMM vs RNN

HMMs are simpler than RNN

  • 36 Hidden States (POS Tags)
  • 5000 words in vocabulary
  • HMM Parameters:
    • Transition Parameters: 36*36 = 1296
    • Emission Parameters: 36*5000 = 180000
    • Total  = 181296
  • RNN Parameters:
    • \(U\) = 5000*300  = 1500000
    • \(W\) = 300*300 = 90000
    • \(V\) = 300*36 = 10800
    • Total = 1600800

HMMs have less parameters, hence require less data

HMM vs RNN

  • HMMs make a Markov Assumption

 

\( P(y_t | y_{t-1},...,y_2,y_1) = P(y_t | y_{t-1}) \)

  • RNNs model the entire conditional dependency

 

\( P(w_6 |w_5,w_4,..,w_1)\)

\( x_1\)

\( x_2\)

\( x_3\)

\( x_4\)

\( x_5\)

\( U \)

\( U \)

\( U \)

\( U \)

\( U \)

\( W \)

\( W \)

\( W \)

\( W \)

\( V \)

\( s_1 \)

\( s_2 \)

\( s_3 \)

\( s_4 \)

\( s_5 \)

\( \hat{y}_5\)

\( s_0\)

\( W \)

Find                 me               a                  cheap            Chinese

HMM vs RNN

  • HMMs are generative models

VB

DT

JJ

Find

a

cheap

\(P(x) = \sum_y P(x,y) \)

\( = \sum_y P(x|y)P(y)\)

\( = \sum_y \prod_tP(x_t|y_t)\prod_tP(y_t|y_{t-1})\)

HMM vs RNN

  • RNNs are primarily discriminative models

\( x_1\)

\( x_2\)

\( x_3\)

\( x_4\)

\( x_5\)

\( U \)

\( U \)

\( U \)

\( U \)

\( U \)

\( W \)

\( W \)

\( W \)

\( W \)

\( V \)

\( s_1 \)

\( s_2 \)

\( s_3 \)

\( s_4 \)

\( s_5 \)

\( \hat{y}_5\)

\( s_0\)

\( W \)

\( P(\hat{y}_5 |x_5,x_4,..,x_1)\)

RNN

By suman banerjee