Supervised Open Information Extraction

Open Information Extraction (Open IE) :is systems
extract tuples of natural language expressions that

represent the basic propositions asserted by a sentence.

Usages :They have been used for

a wide variety of tasks, such as textual entailment , question answering

, and knowledge base populations

Open IE was used in Semi-supervised approach or rule based algorithm . In this paper they present new data and method for open IE to improve the performance they used supervised learning .
They extend QA-SRL techniques and apply it to the QAMR corpus.

Background :

Different open IE system and flavors:

Open IE development and lack of standard benchmark dataset make various Open IE systems tackling different facets of the same task.

Open IE Corpora:

They create and make available a new Open IE training corpus, All Words Open IE (AW-OIE), derived from QAMR

BIO Encoding:

Each tuple is encoded with respect to a single predicate, where argument labels indicate their position in the tuple

Text

Evaluation

Metrics

Precision-recall (PR) curve
Compute the area under the PR curve(AUC)
F-measure

Precision =

True Positives / (True Positives + False Positives)

Recall =

True Positives / (True Positives + False Negatives)

F1 =

2* precision* recall/

precision + recall

The sheriff standing against the wall spoke in a very soft voice

Matching function

The Sheriff;

spoke; &

in a soft voice

The sheriff standing against the wall;

spoke;

in a very soft voice

OIE2016	WEB, NYT, PENN
Penn Treebank gold syntactic trees	predicted trees

Evaluation result-1

Rank 1 1 1 1 1 3 2 1

Furthermore, on all of the test sets,

extending the training set significantly improves our model’s performance

Test set: Only | Include

verb predicates | nominalizations

Evaluation result-2

Performance Analysis - unseen predicates

The unseen part contains 145 unique predicate lemmas in 148 extractions,

24% out of the 590 unique predicate lemmas

&

7% out of the 1993 total extractions

RnnOIE-aw

Performance Analysis - Argument length and number

Performance Analysis - Runtime analysis

	ClausIE	PropS	Open IE4	RnnOIE
Xeon 2.3GHz CPU	4.07	4.59	15.38	13.51
Efficiency %	26.5%	29.8%	100%	87.8%

Data set: 3200 sentences from OIE2016

Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)

(sentences/sec)

Performance Analysis - Error analysis

a random sample of 100 recall errors:

> 40 words

(Avg. :29.4 words/sentence )

Conclusion

Open IE

bi-LSTM

sequence tagging problem

formulating

BIO encoding

confidence score

extend

OIE2016

QAMR

train

QA-SRL paradigm

Pros and Cons

What's unique and impressive

Supervise system
Provide confidence scores for tuning their PR tradeoff
Their system did perform well compare to others

What's need to be explain further

Runtime analysis: GPU V.S. CPU

Bullet Two
Bullet Three

Supervised Open Information Extraction - paper review/part

By jackiechen08

Supervised Open Information Extraction - paper review/part

186

jackiechen08