The limits of shallow approaches on

MCTest

Previous work on

Question Answering

  • SAT-style, Zweig and Burges 2012
  • DeepRead, Hirschman et al. 1999
  • DeepSelection, Yu et al. 2014
  • QANTA, Iyyer et al. 2014

MCTest

  • Multiple choice
  • Two question types
  • Open-domain
  • Common sense
  • Children stories
  • Fictional settings

Single

Multiple

Inviting Giraffes to parties    (160test Q33)

The blue ball said hello  (160dev Q7)

Owls having socks  (160dev Q10)

MC160

MC500

TRAIN  DEV  TEST

TRAIN  DEV  TEST

Quality check by hand

Quality check by algorithm

Project Goal

  • Limit of shallow approaches
  • Exploring Rule-based system
  • Improve upon original baseline

Results

MC160

MC500

69.3%
63.3%
73.5%
64.2%
4%
1%

+70% First

SHALLOW METHODS

It was Jessie Bear's birthday. She was having a party.  She asked her two best friends to come to the party.  She made a big cake, and hung up some balloons

 

  • A) Jessie Bear
  • B) no one
  • C) Lion
  • D) Tiger

be

be

have

ask

she

friend

make

hang

be

have

LEMMATISATION

STOPWORDS

COREFERENCE

1) Who was having a birthday?

Jessie

Jessie

Jessie

Jessie

Jessie birthday.

party.

ask two friend come party.

make big cake hang balloon

  • A)                 Jessie          
  • B)                 no one
  • C)                 Lion
  • D)                Tiger

birthday

birthday

birthday

birthday

2

1

1

1

QA combining

matching

Scoring

P' =lemmatize(tokenize(P))
P=lemmatize(tokenize(P))
Q_i' =lemmatize(tokenize(Q_i))
Qi=lemmatize(tokenize(Qi))
A_{ij}' =lemmatize(tokenize(A_{ij}))
Aij=lemmatize(tokenize(Aij))
S_{ij}=(P' \cap (Q'_i \cup A'_{ij}) ) \setminus X
Sij=(P(QiAij))X
P, \ story\ passage
P, story passage
Q_i, \ question\ i
Qi, question i
A_{ij}, \ answer\ j\ for\ question\ i
Aij, answer j for question i
S_{ij}, \ words\ matched\ for\ question\ i\ and\ answer\ j
Sij, words matched for question i and answer j
X, \ stopwords
X, stopwords
Matching =
Matching=
0
0
3
3
4
4
5
5
0
0
1
1
3
3
2
2
\vdots
0
0
True =
True=
0
0
0
0
0
0
1
1
0
0
0
0
0
0
1
1
\vdots
0
0
\{
{
\{
{

Word Matching (WM)

Single  67.88%    62.76%
Multi   50.31%    46.72%
All     58.43%    53.97%

on train+dev sets

MC160

All     +3.26%    +1.49%

+ co-reference

MC500

What did John do at the beach?

John was at the beach. It was a very warm day.

He decided to go for a swim.

  • -went for a swim

Sentence selection

window up to 3 senteces

Hypernymy

Peter the puppy.

Who is the animal?

puppy (1.0) -> dog (0.5) -> animal (0.3)

Word Matching would score    0

Hypernym would score             0.3

animal

All     58.43%    53.97%

MC160

All     +1.55%    +1.48%

+ hypernym

MC500

Word Matching

Rule-based systems

Implementation

def applyTransformations(Story):

  if matchesRuleA(question):
    Story = applyTransformationA(Story)

  if matchesRuleB(question):
    Story = applyTransformationB(Story)

  if matchesRuleC(question):
    Story = applyTransformationC(Story)
  ...

  return Story

Applying a series of transformations to the story when a question matches patterns

Rules we explored

Syntactic pattern matching

  • Negation
  • Why questions
  • Character subject
  • Narrative
  • Temporal
  • Implicative

Negation rule

Which food was not eaten?

Hence,

negate the weights

of word tokens


100% accurate

Solution

Character-subject rule

Why did Jon go to the park?

Hence,

we introduce coreference to accurately locate the character

Solution

Result

70.3%  59.6%

MC160

MC500

on training set

Analysis

Using this system we

 

  • can analyze the performance 
  • can understand the limitations

 

of a lexical system

Limitations

What two characters are in this book?

This is a story of a girl and what kind of animal?

What is the name of the boy in the story?

Lexical system has no understanding of narrative or characters.

Learning a

Scoring function

SVM

WM+Coref

WM+Hypernym

WM+Coref Selection on Q

WM+Coref

WM+Coref Selection on QA

WM+Coref

WM+Coref Selection on QA

WM+Hypernym

WM+Hypernym Selection on QA

WM+Coref

Score(P_i,Q_{ij}, A_{ijk})
Score(Pi,Qij,Aijk)
0\ldots1
01

Platt Scaling

Shallow methods

MC160

MC500

68.0%
59.9%
71.4%
60.2%
3.4%
0.3%

SW+D

SVM

(combined)

Textual Entailment

Augmented our Rule-based system with RTE BIUTEE

RTE Result

MC160

MC500

SW+D

+RTE

RBS

+RTE

69.3%
63.3%
73.5%
64.2%
4.2%
0.9%

Conclusions

MC160 can be beaten

by shallow methods

MC500 requires deeper

understanding of natural language

Shallow methods have a limit

74%

  • More sophisticated Rule-based system

 

  • Natural Logic (Angeli and Manning 2014)
  • Deep Sentence Selection (Yu et al. 2014)

Future

Questions?

:)

MCTest

By Nicola Greco

MCTest

  • 1,984