Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan
Presented By:
Dharitri Rathod
Lamia Alshahrani
Yuh Haur Chen
It is a system to extract tuples of natural language expression that represents the basic propsitions asserted by a sentence
Open Information Extraction (OIE)
Usages
(Banko,2007)
Idea of Open IE
(Fader,2011)
Reverb
(Michael,2018)
(Stanovsky,2016)
OpenIE4
OIE2016
QAMR
BIO tagging
Supervised OpenIE
(He et,2015)
QA-SRL
Each tuple is encoded with respect to a single predicate, where argument labels indicate their position in the tuple.
“Barack Obama, a former US President, was born in Hawaii”
Tuples:
(Barack Obama; was born in; Hawaii)
(a former US President; was born in; Hawaii)
Word label distribution:
Predicate
Argument1
Argument0
Argument0
"Mercury filling, particularly prevalent in the USA, was banned in the EU, partly because it causes antibiotic resistance. "
QA-SRL:
| Predicate | question | answer |
|---|---|---|
| banned(v) | what was banned? | Mercury filling |
| where was something banned? | in the EU |
| prevalent (adj) |
what was particularly prevalent in the USA? |
|---|
QA-MR:
| particularly prevalent |
what was particularly prevalent in the USA? | Mercury filling |
|---|
Avoid questions which:
RNNOIE(QAMR):
(mercury filling; was banned; in the EU)
(mercury filling; particularly prevalent; in the USA)
a supervised Open IE Model
"working love learning we on deep"
"we love working on deep learning" :)
LSTM
BI-LSTM
Obama was
in America
Extraction confidence: by multiplying probabilities of B and I labels in E, which is useful for tuning their PR tradeoff
Metrics
Precision =
True Positives / (True Positives + False Positives)
Recall =
True Positives / (True Positives + False Negatives)
Metrics
The sheriff standing against the wall spoke in a very soft voice
Matching function
The Sheriff;
spoke; &
in a soft voice
The sheriff standing against the wall;
spoke;
in a very soft voice
| OIE2016 | WEB, NYT, PENN |
|---|---|
| Penn Treebank gold syntactic trees | predicted trees |
Evaluation result-1
Rank 1 1 1 1 1 3 2 1
Test set: Only | Include
verb predicates | nominalizations
Evaluation result-2
"Furthermore, on all of the test sets,
extending the training set significantly improves our model’s performance"
Performance Analysis - Runtime analysis
| ClausIE | PropS | Open IE4 | RnnOIE | |
|---|---|---|---|---|
| Xeon 2.3GHz CPU | 4.07 | 4.59 | 15.38 | 13.51 |
| Efficiency % | 26.5% | 29.8% | 100% | 87.8% |
Data set: 3200 sentences from OIE2016
Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)
(sentences/sec)
Performance Analysis - Error analysis
a random sample of 100 recall errors:
> 40 words
(Avg. :29.4 words/sentence )
| Pros | Cons |
|---|---|
| Supervised system | Lack of comparison group to compare CPU and GPU runtime performance |
| Provide confidence scores for tuning their PR tradeoff | It is dependent on the probabilities of B and I labeling |
| Their system did perform well compare to others (PR Curve, AUC) |
QAMR: what filling was made of? mercury RNNOIE: ???? (due to the restriction) |
Open IE
Bi-LSTM
Sequence tagging problem
formulating
BIO encoding
confidence score
extend
OIE2016
QAMR
train
QA-SRL
Open IE
Bi-LSTM
Sequence tagging problem
formulating
BIO encoding
confidence score
extend
OIE2016
QAMR
train
QA-SRL
Extend RNNOIE Using all QAMR