A UNIFIED FRAMEWORK FOR STRUCTURED PREDICTION

Wei Lu

            Research Group

Singapore University of Technology and Design

Outline

  • Introduction
     
  • Decoding, Learning
     
  • Semi-Markov, Latent CRF, Latent SSVM
     
  • Parsing with CRF, Hybrid Tree, and Predicting Overlapping Structures
     
  • Pipeline, Mean Field and Neural CRF

1. Introduction

Structured Prediction

Fruit

flies

like

a

banana

Part-of-Speech Tagging

Fruit

flies

like

a

banana

A

N

V

D

N

Noun-Phrase Chunking

Fruit

flies

like

a

banana

NP

NP

Constituency Parsing

Fruit

flies

like

a

banana

NP

NP

VP

S

A

N

V

D

N

Fruit

flies

like

a

banana

Dependency Parsing

Semantic Parsing

Fruit

flies

like

a

banana

LIKE(F102, B87)

Semantic Parsing

Fruit

flies

like

a

banana

F102

B87

LIKE

Sentiment Analysis

Fruit

flies

like

a

banana

(       positive       )

(       neutral       )

Nested Chunking

Fruit

flies

like

a

banana

NX

NX

NX

NX

This Tutorial

  • Shares a conceptually new way of thinking about building structured prediction models.

  • Presents a unified structured prediction framework that encompasses classic models, and is able to model structures that standard Graphical Models cannot.

  • Provides a way to rapidly prototype novel structured prediction models for new tasks.

A Unified Framework

\mathbf{w}^{(k)}
\mathbf{w}^{(k+1)}
L
L
U
\Delta(y,y')
\text{GM, NN, Word Embeddings, ...}
\mathbf{f}

Structured Prediction

One Assumption

Structures are constructed by following a collection of discrete actions.

Fruit

flies

like

a

banana

S_{1}

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}

S

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}

S

L

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}

S

L

S

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}

S

L

S

L

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}

S

L

S

S

L

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}
S_{7}

S

L

S

S

L

S

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}
S_{7}
S_{8}

S

L

S

S

L

S

L

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}
S_{7}
S_{8}
S_{9}

S

L

S

S

L

S

L

R

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}
S_{7}
S_{8}
S_{9}

S

L

S

S

L

S

L

R

R

S

S: shift

L: left-arc

R:right-arc

States, Actions

Fruit

flies

like

a

banana

S_{1}
S_{2}
S_{3}
S_{4}
S_{5}
S_{6}
S_{7}
S_{8}
S_{9}

S

L

S

S

L

S

L

R

R

S

States, Actions

S: shift

L: left-arc

R:right-arc

States, Actions, Paths

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

A V

A V D

A V D A

A V D A A

States, Actions, Paths

Fruit

flies

like

a

banana

Score of a Path

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

Fruit

flies

like

a

banana

S_\mathbf{w}(p)=\sum_{e\in p}s_\mathbf{w}(e)

Score of the path

Score of each edge

Parameters

Score of a Path

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

A N V D N

p
e

D

Fruit

flies

like

a

banana

S_\mathbf{w}(p_1)=7.2+3.5+0.9+2.0+3.2-10.2=6.6

Score of a Path

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

S_\mathbf{w}(p_2)=-0.2-4.2-1.7-3.0+2.2+12.8=6.9
7.2
3.5
0.9
2.0
3.2
-10.2
12.8
-0.2
-4.2
-1.7
-3.0
2.2

Fruit

flies

like

a

banana

Search

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

7.2
3.5
0.9
2.0
3.2
-10.2
12.8
-0.2
-4.2
-1.7
-3.0
2.2

Exhaustive Search​

Beam Search​

Heuristics Search​

Fruit

flies

like

a

banana

Score of an Edge

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

7.2
3.5
0.9
2.0
3.2
-10.2
12.8
-0.2
-4.2
-1.7
-3.0
2.2
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot
\mathbf{f}(x,[s,a])

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

s_\mathbf{w}(e)

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

s_\mathbf{w}(e)=
\mathbf{w}
\mathbf{w}
\mathbf{f}(x,[s,a])
\mathbf{f}(x,[s,a])
\cdot

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

= \left[ \begin{array}{c} 1.2 \\ \vdots \\ -3.1 \end{array} \right]
s_\mathbf{w}(e)=
\mathbf{w}
\mathbf{w}

D N V

D

\mathbf{f}(x,[s,a])
= \mathbf{f}\Big(x,[\ \ \ \ \ \ \ \ \ \ \ \ \ ,\ \ ]\Big) = \left[ \begin{array}{c} 1 \\ \vdots \\ 0 \end{array} \right]
\mathbf{f}(x,[s,a])
\cdot

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

= \left[ \begin{array}{c} 1.2 \\ \vdots \\ -3.1 \end{array} \right]
s_\mathbf{w}(e)=
\mathbf{w}
\mathbf{w}

D N V

D

\mathbf{f}(x,[s,a])
= \mathbf{f}\Big(x,[\ \ \ \ \ \ \ \ \ \ \ \ \ ,\ \ ]\Big) = \left[ \begin{array}{c} 1 \\ \vdots \\ 0 \end{array} \right]
\mathbf{f}(x,[s,a])
2.0
= 2.0
\cdot

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

= \left[ \begin{array}{c} 1.2 \\ \vdots \\ -3.1 \end{array} \right]
s_\mathbf{w}(e)=
\mathbf{w}
\mathbf{w}

D N V

D

\mathbf{f}(x,[s,a])
= \mathbf{f}\Big(x,[\ \ \ \ \ \ \ \ \ \ \ \ \ ,\ \ ]\Big) = \left[ \begin{array}{c} 1 \\ \vdots \\ 0 \end{array} \right]
\mathbf{f}(x,[s,a])
2.0
= 2.0

A N V

A N V D

-3.0

P N V

P N V D

1.7
\cdot

Fruit

flies

like

a

banana

Score of an Edge

D N V

D N V D

= \left[ \begin{array}{c} 1.2 \\ \vdots \\ -3.1 \end{array} \right]
s_\mathbf{w}(e)=
\mathbf{w}
\cdot
\mathbf{w}

D N V

D

\mathbf{f}(x,[s,a])
= \mathbf{f}\Big(x,[\ \ \ \ \ \ \ \ \ \ \ \ \ ,\ \ ]\Big) = \left[ \begin{array}{c} 0 \\ \vdots \\ 1 \end{array} \right]
\mathbf{f}(x,[s,a])
1.8
= 1.8

A N V

A N V D

1.8

P N V

P N V D

1.8

Fruit

flies

like

a

banana

Score of an Edge

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

7.2
3.5
0.9
2.0
3.2
-10.2
12.8
-0.2
-4.2
-1.7
-3.0
2.2
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot
\mathbf{f}(x,[s,a])

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D N V D N

D

7.2
3.5
0.9
1.8
4.7
-1.6
-1.6
-0.2
-4.2
-1.7
1.8
4.7
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot

Search Graph

\mathbf{f}(x,[s,a])

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

D N

D N V

D N V D

D

7.2
3.5
0.9
1.8
4.7
-1.6
-0.2
-4.2
-1.7
1.8
4.7
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot

Search Graph

\mathbf{f}(x,[s,a])

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

D N

D N V

D

7.2
3.5
0.9
1.8
-1.6
-0.2
-4.2
-1.7
1.8
4.7
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot
\mathbf{f}(x,[s,a])

Search Graph

Fruit

flies

like

a

banana

A

A N

A N V

A N V D

A N V D N

D N

D

7.2
3.5
0.9
-1.6
-0.2
-4.2
-1.7
1.8
4.7
s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot
\mathbf{f}(x,[s,a])

Search Graph

Huang, L., Sagae, K. (2010). Dynamic programming for linear-time incremental parsing.​

Fruit

flies

like

a

banana

A

D

1.2
3.6
12.8
-2.2
7.0

N

V

D

N

N

V

D

12.8
-2.2
3.6
-0.7
2.4

V

4.3
6.6

Search Graph

s_\mathbf{w}(e)=\mathbf{w}\cdot\mathbf{f}(e)
=
\mathbf{w}
\cdot
\mathbf{f}(x,[s,a])

Fruit

flies

like

a

banana

A

D

1.2
3.6
-2.2
1.2

N

V

D

N

V

D

12.8
-2.2
3.6
-0.7
2.4

V

4.3
6.6

Search Graph

Fruit

flies

like

a

banana

A

D

1.2
3.6
1.2

N

V

D

N

V

12.8
-2.2
3.6
-0.7
2.4

V

4.3
6.6

Search Graph

Fruit

flies

like

a

banana

A

D

1.2
1.2

N

V

D

N

12.8
-2.2
3.6
-0.7
2.4

V

4.3
6.6

Search Graph

Fruit

flies

like

a

banana

A

D

1.2
0.9
1.2

N

V

D

N

12.8
-2.2
3.6
-0.7
2.4

V

4.3

N

-0.7
6.6

Search Graph

Fruit

flies

like

a

banana

A

D

1.2
0.9
1.2

N

V

D

N

12.8
-2.2
3.6
-0.7
2.4

V

4.3
6.6

Search Graph

Fruit

flies

like

a

banana

A Compact Search Graph

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

Fruit

flies

like

a

banana

A Compact Search Graph

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

A representation that contains exponentially many directed paths
 

Decoding

Fruit

flies

like

a

banana

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

Decoding

Given Find
x, \mathbf{w}
y

Fruit

flies

like

a

banana

The Search Problem

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

The Search Problem

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

3.6
-1.1
-9.0
2.7

N

V

D

A

P

0.7
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

4.4
5.1
1.5
3.7

N

V

D

A

P

N

V

D

A

P

1.7
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

-0.6
2.2
9.1
1.7

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

2.7
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

-1.2
0.6
2.8
0.3

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

1.4
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

1.4
2.6
-0.8
1.2

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

0.8
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

9.7

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

= 9.7

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

Viterbi

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)
\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Fruit

flies

like

a

banana

MAP Inference

\arg\max_y\mathbf{w}\cdot\mathbf{f}(x,y)

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

N

V

D

A

P

\max_y\mathbf{w}\cdot\mathbf{f}(x,y) = \max_y\sum_j \mathbf{w}\cdot\mathbf{f}(x,[y^{j},y^{j+1}])
= 9.7

Inference

\mathbf{w}
\max

MAP

So Far

Given the parameters     , how to search for the optimal path (and its score)

Next...

How to learn the parameters     ?

w
w