Pronominal Coreference Resolution for Proper Nouns using Hobbs Algorithm

Eugene Kostrov

What is coreference resolution?

What is coreference resolution?

Niall Ferguson is prolific, well-paid and a snappy dresser. Stephen Moss hates him.

What is coreference resolution?

Niall Ferguson is prolific, well-paid and a snappy dresser. Stephen Moss hates him.

  • Identify all entities in the text

What is coreference resolution?

  • Identify all entities in the text
  • Link all mentions to corresponding entities

Niall Ferguson is prolific, well-paid and a snappy dresser. Stephen Moss hates him.

 

Coreferences everywhere

The breath of man and horse mingled, steaming, in the cold morning air as his lord father had the man cut down from the wall and dragged before them. Robb and Jon sat tall and still on their horses, with Bran between them on his pony, trying to seem older than seven, trying to pretend that he'd seen all this before. Bran's father sat solemnly on his horse, long brown hair stirring in the wind. He had taken off Father's face, Bran thought, and donned the face of Lord Stark of Winterfell. There were questions asked and answers given there in the chill of morning, but afterward Bran could not recall much of what had been said. Finally his lord father gave a command, and two of his guardsmen dragged the ragged man to the ironwood stump in the center of the square. They forced his head down onto the hard black wood. Lord Eddard Stark dismounted and his ward Theon Greyjoy brought forth the sword. "Ice," that sword was called. It was as wide across as a man's hand, and taller even than Robb. His father peeled off his gloves and handed them to Jory Cassel, the captain of his household guard. He took hold of Ice with both hands and said, "In the name of Robert of the House Baratheon, the First of his Name, King of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms and Protector of the Realm, by the word of Eddard of the House Stark, Lord of Winterfell and Warden of the North, I do sentence you to die." He lifted the greatsword high above his head. Bran's bastard brother Jon Snow moved closer. "Keep the pony well in hand," he whispered. "And don't look away. Father will know if you do." Bran kept his pony well in hand, and did not look away. His father took off the man's head with a single sure stroke. Bran could not take his eyes off the blood. The head bounced off a thick root and rolled. It came up near Greyjoy's feet. Theon was a lean, dark youth of nineteen who found everything amusing. He laughed, put his boot on the head, and kicked it away. "Ass," Jon muttered, low enough so Greyjoy did not hear. He put a hand on Bran's shoulder, and Bran looked over at his bastard brother. "You did well," Jon told him solemnly. Bran rode with his brothers, well ahead of the main party, his pony struggling hard to keep up with their horses. "The deserter died bravely," Robb said. He was big and broad and growing every day, with his mother's coloring, the fair skin, red-brown hair, and blue eyes of the Tullys of Riverrun. "He had courage, at the least." "No," Jon Snow said quietly. "It was not courage. This one was dead of fear. You could see it in his eyes, Stark." Jon's eyes were a grey so dark they seemed almost black, but there was little they did not see. He was of an age with Robb, but they did not look alike. Jon was slender where Robb was muscular, dark where Robb was fair, graceful and quick where his half brother was strong and fast.

Why do we need it?

  • Information extraction
  • Full text understanding
    • Question answering
    • Summarization
  • Machine translation
  • Dialogue systems

Winograd Schema Challenge

Designed as an improvement to Turing test

Winograd Schema Challenge

Designed as an improvement to Turing test

The trophy would not fit in the brown suitcase because it was too big. What was too big?

Winograd Schema Challenge

Designed as an improvement to Turing test

The trophy would not fit in the brown suitcase because it was too big. What was too big?

The trophy would not fit in the brown suitcase because it was too small. What was too small?

Machine Translation

Hobbs algorithm

Hobbs algorithm

1. Begin at the NP node immediately dominating the pronoun

2. Go up the tree to the first NP or S node encountered. Call this node X and the path used to reach it p.

3. Traverse all branches below node X to the left of path p in a left-to-right, breadth-first fashion. Propose as an antecedent any NP node that is encountered which has an NP or S node between it and X.

4. If node X is the highest S node in the sentence, traverse the surface parse trees of previous sentences in the text in order of recency, the most recent first; each tree is traversed in a left-to-right, breadth-first manner, and when an NP node is encountered, it is proposed as an antecedent. If X is not the highest S node in the sentence, continue to step 5.

 

...

Hobbs algorithm

5. From node X, go up the tree to the first NP or S node encountered. Call this new node X, and call the path traversed to reach it p.

6. If X is an NP node and if the path p to X did not pass through the Nominal node that X immediately dominates, propose X as the antecedent.

7. Traverse all branches below node X to the left of path p in a left-to-right, breadth-first manner. Propose any NP node encountered as the antecedent.

8. If X is an S node, traverse all the branches of node X to the right of path p in a left-to-right, breadth-first manner, but do not go below any NP or S node encountered. Propose any NP node encountered as the antecedent.

9. Go to step 4.

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Hobbs algorithm

Dataset

Average tokens in text = ~71.5

Average sentences in text = ~2

Dataset

Results

Precision Recall F1
Hobbs 0.153 0.156 0.155
Hobbs CNF 0.087 0.088 0.087
AllenNLP 0.340 0.386 0.356

Tools

  • Berkeley Neural Parser
  • spacy / nltk

Pros / Cons

Pros:

  • No need for data
  • If it doesn't work - you can track why
  • Good for short sentences (dialogs, news)

Cons:

  • Works bad where context matters (Winograd)
  • It gives neither indication that it failed nor the reason
  • Will never be perfect
  • ~50 y.o.

Improvements

  • Match pronoun gender to antecedent's one
  • Introducing animacy
  • Make use of special labels like SBARQ, SQ, SINV, etc.
  • Use as a feature in more advanced systems

THE END

deck

By __

deck

  • 8