Supervised Open Information Extraction

Open Information Extraction (Open IE) :is systems
extract tuples of natural language expressions that

represent the basic propositions asserted by a sentence.

 

Usages :They have been used for

a wide variety of tasks, such as textual entailment , question answering

, and knowledge base populations

 

  • Open IE was used in Semi-supervised approach or rule based algorithm . In this paper they present new data and method for open IE to improve the performance they used supervised learning .
  • They extend QA-SRL techniques and apply it to the QAMR corpus.

Background :

 

Different open IE system and flavors:

Open IE development and lack of standard benchmark dataset make various Open IE systems tackling different facets of the same task.

Open IE Corpora:

They create and make available a new Open IE training corpus, All Words Open IE (AW-OIE), derived from QAMR

 

BIO Encoding:

Each tuple is encoded with respect to a single predicate, where argument labels indicate their position in the tuple

Text

Evaluation

Metrics

  1. Precision-recall (PR) curve
  2. Compute the area under the PR curve(AUC)
  3. F-measure

Precision =

True Positives / (True Positives + False Positives)

Recall =

True Positives / (True Positives + False Negatives)

F1 =

2* precision* recall/

precision + recall

The sheriff standing against the wall spoke in a very soft voice

Matching function

The Sheriff;

spoke;                          &

in a soft voice

The sheriff standing against the wall;

spoke;

in a very soft voice

OIE2016 WEB, NYT, PENN
Penn Treebank gold syntactic trees predicted trees

Evaluation result-1

Rank         1      1              1        1             1      3           2      1

Furthermore, on all of the test sets,

extending the training set significantly improves our model’s performance

Test set:           Only                           |           Include

                           verb predicates      |           nominalizations

Evaluation result-2

Performance Analysis - unseen predicates

The unseen part contains 145 unique predicate lemmas in 148 extractions,

 

24% out of the 590 unique predicate lemmas

&

7% out of the 1993 total extractions

RnnOIE-aw

Performance Analysis - Argument length and number

Performance Analysis - Runtime analysis

ClausIE PropS Open IE4 RnnOIE
Xeon 2.3GHz CPU 4.07 4.59 15.38 13.51
Efficiency  % 26.5% 29.8% 100% 87.8%

Data set: 3200 sentences from OIE2016

 Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)

(sentences/sec)

Performance Analysis - Error analysis

a random sample of  100 recall errors:

> 40 words

(Avg. :29.4 words/sentence )

Conclusion

Open IE

bi-LSTM

sequence tagging problem

formulating

BIO encoding

 confidence  score

extend 

OIE2016

QAMR

train

QA-SRL paradigm

Pros and Cons 

What's unique and impressive 

  • Supervise system
  • Provide confidence scores for tuning their PR tradeoff
  • Their system did perform well compare to others

What's need to be explain further

  • Runtime analysis: GPU V.S. CPU

 

 

 

 

 

  • Bullet Two
  • Bullet Three

Supervised Open Information Extraction - paper review/part

By jackiechen08

Supervised Open Information Extraction - paper review/part

  • 186