Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan
Presented By
Dharitri Rathod
Lamia Alshahrani
Yuh Haur Chen
Open Information Extraction (OIE)
Usages
Open IE is systems
extract tuples of natural language expressions that
represent the basic propositions asserted by a sentence.
Usages :They have been used for
a wide variety of tasks, such as textual entailment , question answering
, and knowledge base populations
2007
Idea of Open IE
2011
Reverb
2018
2016
OpenIE4
OIE2016
QA-SRL
QAMR
Supervised OpenIE
“Barack Obama, a former US President, was born in Hawaii”
"Mercury filling, particularly prevalent in the USA, was banned in the EU, partly because it causes antibiotic resistance. "
Who did What to whom, when, and where?
"working love learning we on deep"
"we love working on deep learning" :)
LSTM
BI-LSTM
Confidence interable
Metrics
Precision =
True Positives / (True Positives + False Positives)
Recall =
True Positives / (True Positives + False Negatives)
F1 =
2* precision* recall/
precision + recall
The sheriff standing against the wall spoke in a very soft voice
Matching function
The Sheriff;
spoke; &
in a soft voice
The sheriff standing against the wall;
spoke;
in a very soft voice
| OIE2016 | WEB, NYT, PENN |
|---|---|
| Penn Treebank gold syntactic trees | predicted trees |
Evaluation result-1
Rank 1 1 1 1 1 3 2 1
"Furthermore, on all of the test sets,
extending the training set significantly improves our model’s performance"
Test set: Only | Include
verb predicates | nominalizations
Evaluation result-2
Performance Analysis - unseen predicates
The unseen part contains 145 unique predicate lemmas in 148 extractions,
24% out of the 590 unique predicate lemmas
&
7% out of the 1993 total extractions
RnnOIE-aw
Performance Analysis - Runtime analysis
| ClausIE | PropS | Open IE4 | RnnOIE | |
|---|---|---|---|---|
| Xeon 2.3GHz CPU | 4.07 | 4.59 | 15.38 | 13.51 |
| Efficiency % | 26.5% | 29.8% | 100% | 87.8% |
Data set: 3200 sentences from OIE2016
Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)
(sentences/sec)
Performance Analysis - Error analysis
a random sample of 100 recall errors:
> 40 words
(Avg. :29.4 words/sentence )
Open IE
bi-LSTM
sequence tagging problem
formulating
BIO encoding
confidence score
extend
OIE2016
QAMR
train
QA-SRL paradigm
| Pros | Cons |
|---|---|
| Supervise system | CPU runtime is slower |
| Provide confidence scores for tuning their PR tradeoff | It can't answer some QA |
| Their system did perform well compare to others | |