Supervised Open Information Extraction

Gabriel Stanovsky, Julian Michael, Luke Zettlemoyer, and Ido Dagan

 

 

Presented By:

Dharitri Rathod

Lamia Alshahrani

Yuh Haur Chen

What is

Open Information Extraction?

It is a system to extract tuples of natural language expression that represents the basic propsitions asserted by a sentence

Open Information Extraction (OIE)

 

 

 

Usages

  • Textual entailment
  • Question answering
  • Knowledge based populations

 

 

Background

(Banko,2007)

Idea of Open IE

(Fader,2011)

Reverb

(Michael,2018)

(Stanovsky,2016)

OpenIE4

OIE2016

 

QAMR

BIO tagging

Supervised OpenIE

 

(He et,2015)

QA-SRL

Custom BIO Tagging

 

Each tuple is encoded with respect to a single predicate, where argument labels indicate their position in the tuple​.

“Barack Obama, a former US President, was born in Hawaii”

Tuples:

(Barack Obama; was born in; Hawaii)

(a former US President; was born in; Hawaii)

Word label distribution:

Predicate

Argument1

Argument0

Argument0

QA-SRL               QAMR

  • Who did What to whom, when, and where?

"Mercury filling, particularly prevalent in the USA, was banned in the EU, partly because it causes antibiotic resistance. "

QA-SRL:

Predicate question answer
banned(v) what was banned? Mercury filling
where was something banned? in the EU
prevalent
(adj)
what was particularly prevalent in the USA?

QA-MR:

particularly prevalent
  
what was particularly prevalent in the USA? Mercury filling

OPENIE4          RNNOIE

  1. Introduce new content words
  2. Have more than one wh-word
  3. Do not start with who, what, when, or where
  4. Ask what did X do? (delegating the predicate to answer )

Avoid questions which:

RNNOIE(QAMR):

(mercury filling; was banned; in the EU)

(mercury filling; particularly prevalent; in the USA)

RNN OIE

a supervised Open IE Model

"working love learning we on deep" 

"we love working on deep learning"  :)

LSTM

BI-LSTM

Obama was

in America

Extraction confidence: by multiplying probabilities of B and I labels in E, which is useful for tuning their PR tradeoff

Evaluation

Metrics

  1. Precision-recall (PR) curve

  2. Compute the area under the PR curve(AUC)

  3. F-measure

Precision =

True Positives / (True Positives + False Positives)

Recall =

True Positives / (True Positives + False Negatives)

F1 =  2* precision* recall/ precision + recall

Metrics

The sheriff standing against the wall spoke in a very soft voice

Matching function

The Sheriff;

spoke;                          &

in a soft voice

The sheriff standing against the wall;

spoke;

in a very soft voice

OIE2016 WEB, NYT, PENN
Penn Treebank gold syntactic trees predicted trees

Evaluation result-1

Rank         1      1              1        1             1      3           2      1

Test set:           Only                           |           Include

                           verb predicates      |           nominalizations

Evaluation result-2

"Furthermore, on all of the test sets,

extending the training set significantly improves our model’s performance"

Performance Analysis - Runtime analysis

ClausIE PropS Open IE4 RnnOIE
Xeon 2.3GHz CPU 4.07 4.59 15.38 13.51
Efficiency  % 26.5% 29.8% 100% 87.8%

Data set: 3200 sentences from OIE2016

 Using GPU : RnnOIE get 11.047 times faster (149.25 sentences/sec)

(sentences/sec)

Performance Analysis - Error analysis

a random sample of  100 recall errors:

> 40 words

(Avg. :29.4 words/sentence )

Pros and Cons 

What's unique and impressive?

Pros Cons
Supervised system Lack of comparison group to compare CPU and GPU runtime performance
Provide confidence scores for tuning their PR tradeoff It is dependent on the probabilities of B and I labeling
Their system did perform well compare to others
(PR Curve, AUC)
QAMR: what filling was made of? mercury
RNNOIE: ????
(due to the restriction)

Summary

Open IE

Bi-LSTM

Sequence tagging problem

formulating

BIO encoding

 confidence  score

extend 

OIE2016

QAMR

train

QA-SRL

Open IE

Bi-LSTM

Sequence tagging problem

formulating

BIO encoding

 confidence  score

extend 

OIE2016

QAMR

train

QA-SRL

Extend RNNOIE Using all QAMR

References

  1. Luheng He, Mike Lewis, and Luke Zettlemoyer, 2015, Question-answer driven semantic role labeling: Using natural language to annotate natural language. In the Conference on Empirical Methods in Natural Language Processing (EMNLP).
  2. Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, and Luke Zettlemoyer. 2018. Crowdsourcing question-answer meaning representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics.
  3. Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using recurrent neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pages 1127–1137.
  4. Rudolf Schneider, Tom Oberhauser, Tobias Klatt, Felix A. Gers, and Alexander Loser. 2017. Analyzing errors of open information extraction systems. CoRR abs/1707.07499.
  5. Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-12, 2007, pages 2670–2676.
  6. https://medium.com/@raghavaggarwal0089/bi-lstm-bc3d68da8bd0
  7. Anthony Fader, Stephen Soderland, and Oren Etzioni. 2011. Identifying relations for open  information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pages 1535–1545.
  8. https://gabrielstanovsky.github.io/assets/papers/naacl18long/poster.pdf

Supervised Open Information Extraction - paper review-Team 10

By jackiechen08

Supervised Open Information Extraction - paper review-Team 10

  • 388