Ilya Sutskever
Oriol Vinyals
Quoc V. Le
{ilyasu, vinyals, qvl}@google.com
10757025 陳威廷
10757011 吳家豪
10757019 楊敘
CoRR, September 2014
\(h_t=sigm(W^{hx}x_t+W^{hh}h_{t-1})\)
\(y_t=W^{yh}h_t\)
\(y_1\)
\(y_2\)
\(y_T\)
\(x_1\)
\(x_2\)
\(x_T\)
\(h_1\)
\(h_2\)
\(h_{T-1}\)
\(p(y_1, ..., y_{T'} | x_1, ..., x_T) = \displaystyle\prod_{t=1}^{T'}p(y_t|v, y_1, ..., y_{t-1})\)
\(1/|\mathcal{D}|\displaystyle\sum_{(T,S)\in\mathcal{D}}\log p(T|S)\)
\(\hat{T}=\arg\displaystyle\max_T p(T|S)\)
\(c\)
\(b\)
\(a\)
\(\alpha\)
\(\beta\)
\(\gamma\)
\(c\)
\(b\)
\(a\)
\(\alpha\)
\(\beta\)
\(\gamma\)
| Method | Test BLEU Score (NTST14) |
|---|---|
| Bahdanau et al. | 28.45 |
| Baseline System | 33.30 |
| Single forward LSTM, beam size 12 | 26.17 |
| Single reversed LSTM, beam size 12
|
30.59 |
| Ensemble of 5 reversed LSTMs, beam size 1 | 33.00 |
| Ensemble of 2 reversed LSTMs, beam size 12 | 33.27 |
| Ensemble of 5 reversed LSTMs, beam size 2 | 34.50 |
| Ensemble of 5 reversed LSTMs, beam size 12 | 34.81 |
Table 1: The performance of the LSTM on WMT'14 English to French test set (ntst14).
| Method | Test BLEU Score (NTST14) |
|---|---|
| Baseline System | 33.30 |
| Cho et al. | 34.54 |
| State of the art | 37.0 |
| Rescoring the baseline 1000-best with a single forward LSTM | 35.61 |
| Rescoring the baseline 1000-best with a single reversed LSTM |
35.85 |
| Rescoring the baseline 1000-best with an ensemble of 5 reversed LSTMs | 36.5 |
| Oracle Rescoring of the Baseline 1000-best lists | ~45 |
Table 2: Methods that use neural networks together with an SMT system.
| Method | Test BLEU Score (NTST14) |
|---|---|
| Bahdanau et al. | 28.45 |
| Baseline System | 33.30 |
| Cho et al. | 34.54 |
| State of the art | 37.00 |
| Single forward LSTM, beam size 12 | 26.17 |
| Single reversed LSTM, beam size 12 | 30.59 |
| Ensemble of 5 reversed LSTMs, beam size 1 | 33.00 |
| Ensemble of 2 reversed LSTMs, beam size 12 | 33.27 |
| Ensemble of 5 reversed LSTMs, beam size 2 | 34.50 |
| Ensemble of 5 reversed LSTMs, beam size 12 | 34.81 |
| Rescoring the baseline 1000-best with a single forward LSTM | 35.61 |
| Rescoring the baseline 1000-best with a single reversed LSTM | 35.85 |
| Rescoring the baseline 1000-best with an ensemble of 5 reversed LSTMs | 36.5 |
| Oracle Rescoring of the Baseline 1000-best lists | ~45 |
Figure 2: The 2-dimensional PCA projection of the LSTM hidden states that are obtained after processing the phrases in the figures.
Figure 3: The left plot is showed with sorted by length of test sentences. The right plot is showed with sorted by average word frequency rank of test sentences.