Open Information Extraction (Open IE) :is systems
extract tuples of natural language expressions that
represent the basic propositions asserted by a sentence.
Usages :They have been used for
a wide variety of tasks, such as textual entailment , question answering
, and knowledge base populations
Background :
Different open IE system and flavors:
Open IE development and lack of standard benchmark dataset make various Open IE systems tackling different facets of the same task.
Open IE Corpora:
They create and make available a new Open IE training corpus, All Words Open IE (AW-OIE), derived from QAMR
Text
Metrics
Precision =
True Positives / (True Positives + False Positives)
Recall =
True Positives / (True Positives + False Negatives)
F1 =
2* precision* recall/
precision + recall
The sheriff standing against the wall spoke in a very soft voice
Matching function
The Sheriff;
spoke; &
in a soft voice
The sheriff standing against the wall;
spoke;
in a very soft voice
| OIE2016 | WEB, NYT, PENN |
|---|---|
| Penn Treebank gold syntactic trees | predicted trees |
Evaluation result-1
Rank 1 1 1 1 1 3 2 1
Furthermore, on all of the test sets,
extending the training set significantly improves our model’s performance
Test set: Only | Include
verb predicates | nominalizations
Evaluation result-2
Performance Analysis - unseen predicates
The unseen part contains 145 unique predicate lemmas in 148 extractions,
24% out of the 590 unique predicate lemmas
&
7% out of the 1993 total extractions
RnnOIE-aw
Performance Analysis - Argument length and number
Performance Analysis - Runtime analysis
| ClausIE | PropS | Open IE4 | RnnOIE | |
|---|---|---|---|---|
| Xeon 2.3GHz CPU | 4.07 | 4.59 | 15.38 | 13.51 |
| Efficiency % | 26.5% | 29.8% | 100% | 87.8% |
Data set: 3200 sentences from OIE2016
Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)
(sentences/sec)
Performance Analysis - Error analysis
a random sample of 100 recall errors:
> 40 words
(Avg. :29.4 words/sentence )
Open IE
bi-LSTM
sequence tagging problem
formulating
BIO encoding
confidence score
extend
OIE2016
QAMR
train
QA-SRL paradigm