Single
Multiple
Inviting Giraffes to parties (160test Q33)
The blue ball said hello (160dev Q7)
Owls having socks (160dev Q10)
TRAIN DEV TEST
TRAIN DEV TEST
Quality check by hand
Quality check by algorithm
69.3%
63.3%
73.5%
64.2%
4%
1%
It was Jessie Bear's birthday. She was having a party. She asked her two best friends to come to the party. She made a big cake, and hung up some balloons
be
be
have
ask
she
friend
make
hang
be
have
LEMMATISATION
STOPWORDS
COREFERENCE
1) Who was having a birthday?
Jessie
Jessie
Jessie
Jessie
Jessie birthday.
party.
ask two friend come party.
make big cake hang balloon
birthday
birthday
birthday
birthday
2
1
1
1
QA combining
matching
Scoring
Single 67.88% 62.76%
Multi 50.31% 46.72%
All 58.43% 53.97%
on train+dev sets
MC160
All +3.26% +1.49%
MC500
What did John do at the beach?
John was at the beach. It was a very warm day.
He decided to go for a swim.
Peter the puppy.
Who is the animal?
puppy (1.0) -> dog (0.5) -> animal (0.3)
Word Matching would score 0
Hypernym would score 0.3
animal
All 58.43% 53.97%
MC160
All +1.55% +1.48%
MC500
def applyTransformations(Story):
if matchesRuleA(question):
Story = applyTransformationA(Story)
if matchesRuleB(question):
Story = applyTransformationB(Story)
if matchesRuleC(question):
Story = applyTransformationC(Story)
...
return Story
Applying a series of transformations to the story when a question matches patterns
Syntactic pattern matching
Which food was not eaten?
Hence,
negate the weights
of word tokens
100% accurate
Solution
Why did Jon go to the park?
Hence,
we introduce coreference to accurately locate the character
Solution
70.3% 59.6%
on training set
Using this system we
of a lexical system
What two characters are in this book?
This is a story of a girl and what kind of animal?
What is the name of the boy in the story?
Lexical system has no understanding of narrative or characters.
WM+Coref
WM+Hypernym
WM+Coref Selection on Q
WM+Coref
WM+Coref Selection on QA
WM+Coref
WM+Coref Selection on QA
WM+Hypernym
WM+Hypernym Selection on QA
WM+Coref
Platt Scaling
68.0%
59.9%
71.4%
60.2%
3.4%
0.3%
SW+D
SVM
(combined)
Augmented our Rule-based system with RTE BIUTEE
SW+D
+RTE
RBS
+RTE
69.3%
63.3%
73.5%
64.2%
4.2%
0.9%
MC160 can be beaten
by shallow methods
MC500 requires deeper
understanding of natural language
Shallow methods have a limit
74%
Future
Questions?
:)