Supervised Open Information Extraction

Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan

 

 

Presented By

Dharitri Rathod

Lamia Alshahrani

Yuh Haur Chen

What is

Open Information Extraction?

Open Information Extraction (OIE)

 

Usages

  • textual entailment
  • question answering
  • knowledge base populations

 

  •  

Open IE is systems
extract tuples of natural language expressions that

represent the basic propositions asserted by a sentence.

 

Usages :They have been used for

a wide variety of tasks, such as textual entailment , question answering

, and knowledge base populations

 

  • Open IE was used in Semi-supervised approach or rule based algorithm . In this paper they present new data and method for open IE to improve the performance they used supervised learning .
  • They extend QA-SRL techniques and apply it to the QAMR corpus.

Background

2007

Idea of Open IE

2011

Reverb

2018

2016

OpenIE4

OIE2016

QA-SRL

QAMR

Supervised OpenIE

 

Custom BIO Tagging

BIO Tagging

Each tuple is encoded with respect to a single predicate, where argument labels indicate their position in the tuple​.

“Barack Obama, a former US President, was born in Hawaii”

QA-SRL               QAMR

"Mercury filling, particularly prevalent in the USA, was banned in the EU, partly because it causes antibiotic resistance. "

Who did What to whom, when, and where?

RNN OIE

"working love learning we on deep" 

"we love working on deep learning"  :)

LSTM

BI-LSTM

Confidence interable

Evaluation

Metrics

  1. Precision-recall (PR) curve
  2. Compute the area under the PR curve(AUC)
  3. F-measure

Precision =

True Positives / (True Positives + False Positives)

Recall =

True Positives / (True Positives + False Negatives)

F1 =

2* precision* recall/

precision + recall

The sheriff standing against the wall spoke in a very soft voice

Matching function

The Sheriff;

spoke;                          &

in a soft voice

The sheriff standing against the wall;

spoke;

in a very soft voice

OIE2016 WEB, NYT, PENN
Penn Treebank gold syntactic trees predicted trees

Evaluation result-1

Rank         1      1              1        1             1      3           2      1

"Furthermore, on all of the test sets,

extending the training set significantly improves our model’s performance"

Test set:           Only                           |           Include

                           verb predicates      |           nominalizations

Evaluation result-2

Performance Analysis - unseen predicates

The unseen part contains 145 unique predicate lemmas in 148 extractions,

 

24% out of the 590 unique predicate lemmas

&

7% out of the 1993 total extractions

RnnOIE-aw

Performance Analysis - Runtime analysis

ClausIE PropS Open IE4 RnnOIE
Xeon 2.3GHz CPU 4.07 4.59 15.38 13.51
Efficiency  % 26.5% 29.8% 100% 87.8%

Data set: 3200 sentences from OIE2016

 Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)

(sentences/sec)

Performance Analysis - Error analysis

a random sample of  100 recall errors:

> 40 words

(Avg. :29.4 words/sentence )

Summary

Open IE

bi-LSTM

sequence tagging problem

formulating

BIO encoding

 confidence  score

extend 

OIE2016

QAMR

train

QA-SRL paradigm

Pros and Cons 

What's unique and impressive?

Pros Cons
Supervise system CPU runtime is slower
Provide confidence scores for tuning their PR tradeoff It can't answer some QA
Their system did perform well compare to others

References

Copy of deck

By jackiechen08

Copy of deck

  • 184