Ultra-Fine Entity Typing

Eunsol Choi

Omer Levy

Yejin Choi

Luke Zettlemoyer*

Paul G. Allen School of Computer Science & Engineering, University of Washington

*Allen Institute for Artificial Intelligence, Seattle WA

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 87-96

Abstract

New entity typing task.
New evaluation sets.
New model can predict open types.
- Achieves state-of-the-art performance.
- Set baseline for the new dataset.

Introduction

Example

Bill robbed John. He was arrested.
- "Bill", "he" are both "criminal".
  - Due to "robbing" & "arresting".
- "John" is a "victim".
  - Because he was "robbed".

New Task

Given a sentence with a target entity mention.
Predict free-form NP for the target entities.

Existing Datasets

The label is heavily skewed toward coarse-grained types.
e.g. OntoNotes dataset marked about half of mentions as "other".

New Dataset

More diverse and fine-grained.

Task & Data

Task

Given a sentence and an entity mention \(e\).
Predict a set of natural-language phrases \(T\) that describe the type of \(e\).
The selection of \(T\) is context sensitive.
- e.g. "Bill Gates has donated billions to eradicate malaria."
- "Bill Gates" should be typed as "philanthropist" but not "inventor".

Data

About 6K mentions via crowdsourcing.
Using a large type vocabulary.

Sentence Source

Gigaword
OntoNotes
Web Articles
- Via links to Wikipedia

Automatic Mention Detection

Maximal NP from constituency parser.
Mentions from coreference resolution system.
e.g. In 1817, in collaboration with David Hare, he set up the Hindu College.

Annotators

5 workers from Mechanical Turk.
About 10K frequent NPs from Wiktionary.
Provide labels per each example.
Using WordNet to expand synonyms and hypernyms.
Only collect types selected by at least 3 annotators.

Data Analysis

Each type is classified into 3 disjoint bins:
- 9 general types
  - e.g. person, location.
- 121 fine-grained types
  - Such as film, athlete.
  - Mapped to labels from prior work.
- 10,201 ultra-fine types
  - Encompassing every other label in the type space.
  - e.g. detective, lawsuit.

Data Analysis

6K examples
5 labels per example.
- 0.9 general type
- 0.6 fine types
- 3.9 ultra-fine types.
2.3k of unique types.
429 types to cover 80% labels.

Type Coverage

To cover 80%:
- FIGER requires only 7 types.
- OntoNotes requires only 4 types.
- The new dataset requires 429 types.

Mention Coverage

Existing datasets focus on named entity mentions.
- OntoNotes contained nomial expressions.
The new dataset:
- 40% pronouns.
- 38% nominal expressions.
- 22% named entity mentions.

Distant Supervision

Fine-grained NER systems is typically obtained by:
- Linking entity mentions.
- Drawing their types from knowledge bases.
Limitations
- Recall can suffer due to KB incompleteness.
- Precision can suffer when the selected types do not fit the context.

Recall Problem

Mining entity mentions that were linked to wiki page.
Extract types from their encyclopedic definitions.

Precision Problem

A new source of distant supervision.
Automatically extracted nominal head words from raw text.
Using head words as a form of distant supervision provides fine-grained information about mentions.
- e.g. The 44th president of Unite State.

Entity Linking

The first sentence of a wiki page often states the entity's type via an "is a" relation.
Extracted descriptions for 3.1M entities which contain 4.6K unique type labels.

Contextualized Supervision

Many nominal entity mentions include type information.
Nominal head words are extracted.
- With a dependency parser from Gigaword and Wikilink.
Lower case all words and convert plural to singular.

Model

The architecture is from neural AttentiveNER model.
- Improving the representations.
- Introducing a new multitask objective to handle multiple sources of supervision.

Context Representation

Given a sentence \(x_1, ..., x_n\)
Represent each token \(x_i\) using a pre-trained word embedding \(w_i\).
Concatenate an additional location embedding \(l_i\).
- Whether \(x_i\) is before, inside, or after the mention.

Context Representation

Using \([x_i;l_i]\) as an input to a bidirectional LSTM.
- Producing a contextualized representation \(h_i\) for each token.
Represent the context \(c\) as a weighted sum of the contextualized token representations using MLP-based attention:
- \(a_i=SoftMax_i(v_a\cdot relu(W_ah_i))\)
- \(W_a\) and \(v_a\) are parameters of MLP.

Mention Representation

Represent the mention \(m\) as the concatenation of two items:
- A character-based representation produced by a CNN.
- A weighted sum of the pre-trained word embeddings in the mention span computed by attention.

Final Representation

Concatenation of context and mention representation.
The final representation: \(r=[c;m]\)

Label Prediction

A type label embedding matrix \(W_t \in R^{n\times d}\)
- \(n\) is the number of labels in prediction space.
- \(d\) is the dimension of \(r\).
This matrix is combination of \(W_{gerenal}, W_{fine}, W_{ultra}\).
Each type's probability is predicted via the sigmoid of its inner product with \(r: y=\sigma(W_tr)\)
- Predict every type \(t\) for which \(y_t>0.5\).
- Or \(arg\ max\ y_t\) if no such type.

Multiple Source

Distant supervision provide partial ultra-fine types.
KBs provide general types.
Head words provide only ultra-fine types.

Multitask Objective

Divide labels into three bins (general, fine, and ultra-fine).
Update labels only in bin.
The training objective is to minimize \(J\) where \(t\) is the target vector at each granularity:
- \(J_{all}=J_{general}\cdot 1_{general}(t)+J_{fine}\cdot 1_{fine}(t)+J_{ultra}\cdot 1_{ultra}(t)\)
- \(l_{category}(t)\) is an indicator function
  - To check if \(t\) contains a type in the category.
- \(J_{category}\) is the category-specific logistic regression objective:
  - \(J=-\sum_{i}t_i\cdot log(y_i)+(1-t_i)\cdot log(1-y_i)\)

Evaluation

Experiment

The AttentiveNER were reimplemented for reference.
Measure
- Macro-averaged precision, recall, F1.
- average mean reciprocal rank (MRR).

Results

Breakdown Results

Analysis

50 examples were analyzed from the dev set.

Improving Existing Fine-Grained NER with Better Distant Supervision

Experiment

The widely-used OntoNotes dataset were choosed.
Augmenting the training data.
Compare performance to other published results and the reimplementation of AttentiveNER.
Measure
- Macro- and micro-averaged F1 score and accuracy.

Augmenting the Training Data

Manually mapping.
- 77% directly correspond to suitable labels.
Expanding labels according to their hypernyms.

Results

Bullet One
Bullet Two
Bullet Three

Ablation Results

Conclusion

These new form of distant supervision boost performa-nce on both the new and the existing dataset.
These result set the first performance and suggest that the data will support significant future work.

Ultra-Fine Entity Typing

By Penut Chen (PenutChen)

Ultra-Fine Entity Typing

Penut Chen (PenutChen)

I love oppai!

github.com/penut85420

Ultra-Fine Entity Typing

Abstract

Abstract

Introduction

Example

New Task

Existing Datasets

New Dataset

Task & Data

Task

Data

Sentence Source

Automatic Mention Detection

Annotators

Data Analysis

Data Analysis

Type Coverage

Mention Coverage

Distant Supervision

Distant Supervision

Recall Problem

Precision Problem

Entity Linking

Contextualized Supervision

Model

Model

Context Representation

Context Representation

Mention Representation

Final Representation

Label Prediction

Multiple Source

Multitask Objective

Evaluation

Experiment

Results

Breakdown Results

Analysis

Improving Existing Fine-Grained NER with Better Distant Supervision

Experiment

Augmenting the Training Data

Results

Ablation Results

Conclusion

Conclusion

Ultra-Fine Entity Typing

More from Penut Chen (PenutChen)