Conditional Random Field

A probabilistic graphical model that models the local information about structure or features from the input variable.

Note - A probabilistic graphical model is a model that assumes inputs to be random variables and gives output in terms of probability of an event happening.

What CRF does - 

It models conditional probability distribution of P(Y|X)

A CRF is a variant of Markov Networks.

This network has induced graphs similar to Markov networks.

note : a markov network is a a set of random variables having markov property described by an undirected graph.

In CRFs, We try to extract the P(Y|X) from the data containing Y and X pairs.

In contrast to the Bayesian model in which we empirically compute P(X|Y) and use it to find out P(Y|X) for prediction only. 

Why CRFs - 

  1. They do not get skewed because of correlated features. In this, we try to model P( Y | x1, x2, x3,....) which assumes x1, x2, x3 ... to be constant for prediction. 
  2. And it does not assume strong independence among its feature in contrast with the Bayesian model.

CRF Representation

 Similar to Gibbs Distribution but is normalized differently -

                    Phi = {phi_1(D1), phi_2(D2) ...... phi_k(Dk)}

 

                    ~P(X, Y) = product_over{i=1, K}(phi_i(Di))

 

                    Z(X) = sum_over{Y}( ~P(X, Y) ) 

                    P(Y|X) = ~P(X, Y) / Z(X)

 where,

    P(Y|X) : family of conditional distribution of Y for any give X

    ~P : unnormalised measure

    Phi : factors with their receptive field Dk

Gibbs Distribution - 

 

     It has general factors: phi_i(Di)

 

where,

     Di is the scope of that factor.

 

It can represent any probability distribution. 

Logistic Model a very simple CRF.

 

we assume that X and Y only take binary values 0  and 1,

 

                          phi_i ( X_i , Y ) = exp { w_i 1 { X=1, Y } }

 

phi_i ( X_i , Y = 1 ) = exp { w_i * X_i }                         phi ( X_i  , Y = 0 ) = 1

 ~P(X, Y = 1) = exp{ sum_over{i} (w_i * X_i) }           ~P(X, Y=0) = 1

 

P(Y=1|X) = (sigmoid function)

         exp{ sum_over{i} (w_i * X_i) }/(exp{ sum_over{i} (w_i * X_i) }+1)

Adapting CRF for Image Segmentation

We assume,

a random field Y defined over a set of variables {Y1, Y2, ..... Yn} with domain of each variable a set of labels L = {L1, L2, ..... Lk}.

another random field X defined over variables {X1, X2,.....Xn}.

Y denote possible pixel labels and X denotes color vector of pixels of the input image

A Conditional Random Field (X, Y) is characterized by Gibbs distribution:

factor Phi of a clique c = exp{- phi_c(Dc)} (why exponentiate?)

~P(X,Y) = product of Cg cliques factors = exp{sum_over{c belonging to Cg}(phi_c(Dc))}

P(Y|X) = (1/Z(X)) * (exp{ - sum_over{c belonging to Cg}(phi_c(Dc))})

 

where,

graph(g) = (V,E) is a graph on Y with each clique c in a set of cliques Cg in graph(g) inducing a potential phi_c

Z(X) = sum_over{Y} ( ~P(X, Y) )

Term inside the exponentiation is what is known as gibbs energy.

Negation of this term gibbs energy is which we later modify for our use case.

A fully connected pairwise CRF model, graph(g) is the complete graph on Y and Cg is the set of all unary and pairwise cliques. 

The corresponding Gibbs energy is 

E(x) = sum_over{all unary potential}(x) + sum_over{all pairwise potential}(xi, xj)

It's up to us to select the pairwise potential model and unary potential model. It depends on the application which model works best. 

In an example implementation pairwise potential function for multiclass segmentation it is of the form 

of paper :

https://arxiv.org/pdf/1210.5644.pdf

The pairwise potential model is,

potts potential * ( w1*(Appearance Kernel) + w2*(Smootness Kernel))

Potts Model introduces a penalty for nearby similar pixels that are assigned different labels

 

The appearance kernel is inspired by the observation that nearby pixels with similar color are likely to be in the same class

 

The smoothness kernel penalizes small isolated regions

Things that I didn't understand till now. How does Pott Potential work (like the use of permutohedral lattice in it)?

Bilateral and Gaussian Energy both using pott potential in the implementation: https://github.com/mbickel/DenseInferenceWrapper/blob/master/denseinference/lib/libDenseCRF/densecrf.cpp

Thank You

C

By Vijay Krishnavanshi

C

  • 894