3. Semi-Markov CRF, Latent CRF, Parsing with CRF, Hybrid Tree and Predicting Overlapping Structures

Noun-Phrase Chunking

Fruit

flies

like

a

banana

NP

NP

Fruit

flies

like

a

banana

B

O

I

B

O

I

B

O

I

B

O

I

B

O

I

Chunking with CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}(x,[y^{j},y^{j+1}])

Chunking with CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-

Fruit

flies

like

a

banana

B

O

I

B

O

I

B

O

I

B

O

I

B

O

I

Chunking with CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
L
\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

B

O

I

B

O

I

B

O

I

B

O

I

B

O

I

Chunking with CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
U
\mathbf{f}(x,[y^{j},y^{j+1}])

Fruit

flies

like

a

banana

Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N},y^{2}=\mathbf{O}])

Sarawagi, S., and Cohen, W. W. (2005). Semi-markov conditional random fields for information extraction. In NIPS.

Fruit

flies

like

a

banana

Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N},y^{2}=\mathbf{O}])

Sarawagi, S., and Cohen, W. W. (2005). Semi-markov conditional random fields for information extraction. In NIPS.

Fruit

flies

like

a

banana

Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N},y^{2}=\mathbf{O}])
L

Fruit

flies

like

a

banana

Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N},y^{2}=\mathbf{O}])
U

banana

Weak Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N}])
\mathbf{f}_2(x,[y^{1}=\mathbf{N},y^2=\mathbf{O}])

Fruit

flies

like

a

Muis, A. O., & Lu, W. (2016). Weak Semi-Markov CRFs for NP Chunking in Informal Text. In NAACL-HLT.

banana

Weak Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N}])
\mathbf{f}_2(x,[y^{1}=\mathbf{N},y^2=\mathbf{O}])
L

Fruit

flies

like

a

banana

Weak Semi-Markov CRF

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-
\mathbf{f}_2(x,[y^{1}=\mathbf{N},y^2=\mathbf{O}])
\mathbf{f}_1(x,[y^{0-1}=\mathbf{N}])
U

a

like

flies

Fruit

Fruit

flies

like

banana

D

A

V

N

D

A

V

N

D

A

V

N

D

A

V

N

D

A

V

N

P

P

P

P

P

a

Latent-Variable CRF

\max_{\mathbf{w}}\log p(y|x)

Fruit

flies

like

banana

D

A

V

D

A

V

D

A

V

D

A

V

D

A

V

P

P

P

P

P

a

Latent-Variable CRF

N2

N1

N2

N1

N2

N1

N2

N1

N2

N1

\max_{\mathbf{w}}\log p(y|x)

Fruit

flies

like

banana

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

Latent-Variable CRF

\max_{\mathbf{w}}\log p(y|x)
=\max_{\mathbf{w}}\log \sum_h p(y,h|x)

Fruit

flies

like

banana

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

Latent-Variable CRF

\max_{\mathbf{w}}\log\Big(\sum_{h}\exp\big(\mathbf{w}\cdot\mathbf{f}({x},h,{y})\big)/\sum_{h',y'}\exp\big(\mathbf{w}\cdot\mathbf{f}({x},h',y')\big)\Big)

Fruit

flies

like

banana

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

Latent-Variable CRF

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Fruit

flies

like

banana

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

Latent-Variable CRF

L
\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Fruit

flies

like

banana

Latent-Variable CRF

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

U
\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Latent-Variable CRF

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Latent-Variable SSVM

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)
\min_{\mathbf{w}}
\sum_i
\Big(
-
+
\Big)
\max_h\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
\max_{h',y}{\big(\Delta(y_i,y,h')+\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}

Yu, C. N. J., & Joachims, T. (2009). Learning structural SVMs with latent variables. In ICML.

Fruit

flies

like

banana

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

L

Latent-Variable SSVM

\min_{\mathbf{w}}
\sum_i
\Big(
-
+
\Big)
\max_h\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
\max_{h',y}{\big(\Delta(y_i,y,h')+\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}
\min_{\mathbf{w}}

Fruit

flies

banana

Latent-Variable SSVM

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

A2

D

A1

V2

V1

N2

N1

P

P

P

P

P

a

like

A

N

V

D

N

U
\min_{\mathbf{w}}
\sum_i
\Big(
-
+
\Big)
\max_h\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
\max_{h',y}{\big(\color{brown}{\Delta(y_i,y,h')}+\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}
U
\min_{\mathbf{w}}

Learning

\mathbf{w}^{(k)}
\max
\max
\mathbf{w}^{(k+1)}
L
L
U

Structured Perceptron, SSVM, Latent SSVM, ...

\Delta(y,y')

Learning

\mathbf{w}^{(k)}
\mathbf{w}^{(k+1)}
L
L
U
\Delta(y,y')
\log\sum\exp
\log\sum\exp

Linear/Semi/Latent/Softmax-margin CRF, ...

So Far

We focused on predicting structures in the form of linear chains.

Next

Structured prediction problems beyond linear structures.

Parsing with CRF, Hybrid Tree, and Predicting Overlapping Structures

Constituency Parsing

Fruit

flies

like

a

banana

NP

NP

VP

S

A

N

V

D

N

Constituency Parsing

Fruit

flies

like

a

banana

NP

NP

VP

S

A

N

V

D

N

VP -> V NP

NP -> D N  

NP -> A N  

Constituency Parsing

Fruit

flies

like

a

banana

NP

NP

VP

S

A

N

V

D

N

VP -> V NP

NP -> D N  

NP -> A N  

NP -> V N  

Fruit

flies

like

a

banana

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-

Parsing with CRF

Finkel, J. R., Kleeman, A., & Manning, C. D. (2008). Efficient, Feature-based, Conditional Random Field Parsing. ​

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\Big)
-
L

Parsing with CRF

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\Big)
-

Hyperpath

Fruit

flies

like

a

banana

Hyperpath, Hypergraph

Fruit

flies

like

a

banana

\log{\sum_{y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,{y})\big)}}
\mathbf{w}\cdot\mathbf{f}({x}_i,{y}_i)
+
\Big)
\Big(
\sum_i
\min_{\mathbf{w}}
-

Parsing with CRF

U

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

Semantic Parsing

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

Semantic Parsing

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Lu, W. (2014). Semantic Parsing with Relaxed Hybrid Trees.

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

\min_{\mathbf{w}}
\sum_i
\Big(
-
\log\sum_h\exp\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h,{y}_i)\big)
+
\log{\sum_{h',y}\exp{\big(\mathbf{w}\cdot\mathbf{f}({x}_i,h',{y})\big)}}
\Big)

Hybrid Tree

Fruit

flies

like

a

banana

Fruit

flies

like

a

banana

Hybrid Tree

Fruit

flies

like

a

banana

Hybrid Tree

Fruit

flies

like

a

banana

Q1. How is this formalism related to Factor Graphs or Graphical Models?

A. Our Hypergraphs are able to capture context-specific independence (CSI) conveniently.

Hypergraphs

Q2. Other than CSI, can we do things with Hypergraphs that standard Graphical Models cannot do?

A. Yes. See next example.

Hypergraphs

Predicting Overlapping Structures

Nested Chunking

NX

NX

NX

What can we do with conventional Graphical Models?

Fruit

flies

like

a

banana

NX

B

B

I

O

I

Approach 1

O

B

B

O

Pipeline

O

Fruit

flies

like

a

banana

Approach 2

Joint

Fruit

flies

like

a

banana

B

B

I

O

I

O

B

B

O

O

Approach 3

Tree

O

NX

S

NX

O

O

NX

Fruit

flies

like

a

banana

NX

Overlapping Structures

NX

NX

NX

Fruit

flies

like

a

banana

NX

What can we do with our hypergraphs?

Hyperpath

Separable Hyperpath

A hyperpath that visits each node in the hypergraph at most once.

Separable Hyperpath

Separable Hypergraphs

A hypergraph whose hyperpaths are all separable.

All the examples that we have seen so far are separable hypergraphs.

What about non-separable hypergraphs?

Non-separable Hyperpath

Lu, W., & Roth, D. (2015). Joint Mention Extraction and Classification with Mention Hypergraphs. In EMNLP.​

Fruit

flies

like

a

banana

Non-separable Hypergraph

L

Fruit

flies

like

a

banana

Non-separable Hypergraph

L

Fruit

flies

like

a

banana

Labeled Hypergraph

L

Fruit

flies

like

a

banana

Labeled Hypergraph

L

Fruit

flies

like

a

banana

Labeled Hypergraph

L

Fruit

flies

like

a

banana

Labeled Hypergraph

L

Fruit

flies

like

a

banana

Labeled Hypergraph

U

Fruit

flies

like

a

banana

Unlabeled Hypergraph

Inference

\mathbf{w}
\max

MAP

Inference

Marginal

\log\sum\exp
\mathbf{w}

Learning

\mathbf{w}^{(k)}
\mathbf{w}^{(k+1)}
L
L
U
\Delta(y,y')
\log\sum\exp
\log\sum\exp

Linear/Semi/Latent/Parsing/Softmax-Margin CRF, Hybrid Tree, Mention Hypergraphs, ...

Learning

\mathbf{w}^{(k)}
\max
\max
\mathbf{w}^{(k+1)}
L
L
U

Structured Perceptron, SSVM, Latent SSVM, ...

\Delta(y,y')

So Far

Semi-Markov CRF, Latent CRF, Parsing with CRF, Hybrid Tree and Predicting Overlapping Structures.

Next...

Pipeline, Joint Models and Neural CRF