SQuAD: The Stanford Question Answering Dataset

Update #3: Convolutional model

August 2nd, 2016

Overview

  • Model
  • Implementation
  • Next steps

Model

Key idea

  • Convolutional neural network model for reranking pairs of short texts (query-doc, question-answer)
  • 2  submodels:
    • Learn optimal vector representation of Q-D
    • Learn a similarity function between Q-D vectors

​Paper: Severyn, Aliaksei, and Alessandro Moschitti. "Learning to rank short text pairs with convolutional deep neural networks." Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2015.

Problem formulation

q_i \in Q
qiQq_i \in Q
D_i=\{ d_{i1}, d_{i2},..., d_{in}\}
Di={di1,di2,...,din}D_i=\{ d_{i1}, d_{i2},..., d_{in}\}
J_i=\{ y_{i1}, y_{i2},..., y_{in}\}
Ji={yi1,yi2,...,yin}J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

Candidate documents

Queries

Relevancy judgements

Given:

h(w, \psi(q_i, D_i)) -> R
h(w,ψ(qi,Di))>Rh(w, \psi(q_i, D_i)) -> R

Learn:

such that relevant sentences appear first

Binary classifier:

h(w, \psi(q_i, d_{ij})) -> y_{ij}
h(w,ψ(qi,dij))>yijh(w, \psi(q_i, d_{ij})) -> y_{ij}

1. Learn representation of Q-D

1

1. Learn representation of Q-D

For each sentence we have:

Skipgram for Wikipedia dump+AQUAINT corpus, dim=50, window=5, freq>=5

Max pooling

Wide, RELU

 

Vector x

fed into

2

m=5

 

100 fiters

 

2. Learn similarity between Q-D

x^T_q*M*x_d
xqTMxdx^T_q*M*x_d
x_{join}
xjoinx_{join}
P(Y=j | x_{join})
P(Y=jxjoin)P(Y=j | x_{join})
Nonlinear func
NonlinearfuncNonlinear func

2

R(q,d)

Training details

  • Minimize cross-entropy loss function
  • Parameters
  • SGD with backpropagation
  • Regularization to mitigate overfitting
  • Data: TREC (answer sentence selection, microblog retrieval)
  • MRR and MAP to evaluate the models
\theta= \{F_q, b_q, F_d, b_d; M; w_h, b_h, w_s, b_s \}
θ={Fq,bq,Fd,bd;M;wh,bh,ws,bs} \theta= \{F_q, b_q, F_d, b_d; M; w_h, b_h, w_s, b_s \}

Implementation

Model for SQUAD's data

SQUAD's Questions

q_i \in Q
qiQq_i \in Q
D_i=\{ d_{i1}, d_{i2},..., d_{in}\}
Di={di1,di2,...,din}D_i=\{ d_{i1}, d_{i2},..., d_{in}\}

Sentences in paragraphs

J_i=\{ y_{i1}, y_{i2},..., y_{in}\}
Ji={yi1,yi2,...,yin}J_i=\{ y_{i1}, y_{i2},..., y_{in}\}

We  can use Jaccard/PMI

Candidate answers

Queries

Relevancy judgements

Our implementation

Embeddings

Additional features

Word (Glove/hybrid)  or SE trained with SQUAD's data

Topic information

Next steps

Copy of 3. Convolutional networks model

By Luis Roman

Copy of 3. Convolutional networks model

Carnegie Mellon University

  • 993