SQuAD: The Stanford Question Answering Dataset

Update #3: Convolutional model

August 2nd, 2016

Overview

Model
Implementation
Next steps

Model

Key idea

Convolutional neural network model for reranking pairs of short texts (query-doc, question-answer)
2 submodels:
- Learn optimal vector representation of Q-D
- Learn a similarity function between Q-D vectors

Paper: Severyn, Aliaksei, and Alessandro Moschitti. "Learning to rank short text pairs with convolutional deep neural networks." Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015.

Problem formulation

q_i \in Q

q_i \in Q

D_i=\{ d_{i1}, d_{i2},..., d_{in}\}

D_i=\{ d_{i1}, d_{i2},..., d_{in}\}

J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

Candidate documents

Queries

Relevancy judgements

Given:

h(w, \psi(q_i, D_i)) -> R

h(w, \psi(q_i, D_i)) -> R

Learn:

such that relevant sentences appear first

Binary classifier:

h(w, \psi(q_i, d_{ij})) -> y_{ij}

h(w, \psi(q_i, d_{ij})) -> y_{ij}

1. Learn representation of Q-D

For each sentence we have:

Skipgram for Wikipedia dump+AQUAINT corpus, dim=50, window=5, freq>=5

Max pooling

Wide, RELU

Vector x

fed into

m=5

100 fiters

2. Learn similarity between Q-D

x^T_q*M*x_d

x^T_q*M*x_d

x_{join}

x_{join}

P(Y=j | x_{join})

P(Y=j | x_{join})

Nonlinear func

Nonlinear func

R(q,d)

Training details

Minimize cross-entropy loss function
Parameters
SGD with backpropagation
Regularization to mitigate overfitting
Data: TREC (answer sentence selection, microblog retrieval)
MRR and MAP to evaluate the models

\theta= \{F_q, b_q, F_d, b_d; M; w_h, b_h, w_s, b_s \}

\theta= \{F_q, b_q, F_d, b_d; M; w_h, b_h, w_s, b_s \}

Implementation

Model for SQUAD's data

SQUAD's Questions

q_i \in Q

q_i \in Q

D_i=\{ d_{i1}, d_{i2},..., d_{in}\}

D_i=\{ d_{i1}, d_{i2},..., d_{in}\}

Sentences in paragraphs

J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

We can use Jaccard/PMI

Candidate answers

Queries

Relevancy judgements

Our implementation

Embeddings

Additional features

Word (Glove/hybrid) or SE trained with SQUAD's data

Topic information

Next steps

3. Convolutional networks model

By Sophie Germain

3. Convolutional networks model

Carnegie Mellon University

1,166

SQuAD: The Stanford Question Answering Dataset

Overview

Model

Key idea

Problem formulation

1. Learn representation of Q-D

1. Learn representation of Q-D

2. Learn similarity between Q-D

Training details

Implementation

Model for SQUAD's data

Next steps

3. Convolutional networks model

More from Sophie Germain