A probabilistic graphical model that models the local information about structure or features from the input variable.
Note - A probabilistic graphical model is a model that assumes inputs to be random variables and gives output in terms of probability of an event happening.
What CRF does -
It models conditional probability distribution of P(Y|X)
A CRF is a variant of Markov Networks.
This network has induced graphs similar to Markov networks.
note : a markov network is a a set of random variables having markov property described by an undirected graph.
In CRFs, We try to extract the P(Y|X) from the data containing Y and X pairs.
In contrast to the Bayesian model in which we empirically compute P(X|Y) and use it to find out P(Y|X) for prediction only.
Why CRFs -
CRF Representation
Similar to Gibbs Distribution but is normalized differently -
Phi = {phi_1(D1), phi_2(D2) ...... phi_k(Dk)}
~P(X, Y) = product_over{i=1, K}(phi_i(Di))
Z(X) = sum_over{Y}( ~P(X, Y) )
P(Y|X) = ~P(X, Y) / Z(X)
where,
P(Y|X) : family of conditional distribution of Y for any give X
~P : unnormalised measure
Phi : factors with their receptive field Dk
Gibbs Distribution -
It has general factors: phi_i(Di)
where,
Di is the scope of that factor.
It can represent any probability distribution.
Logistic Model a very simple CRF.
we assume that X and Y only take binary values 0 and 1,
phi_i ( X_i , Y ) = exp { w_i 1 { X=1, Y } }
phi_i ( X_i , Y = 1 ) = exp { w_i * X_i } phi ( X_i , Y = 0 ) = 1
~P(X, Y = 1) = exp{ sum_over{i} (w_i * X_i) } ~P(X, Y=0) = 1
P(Y=1|X) = (sigmoid function)
exp{ sum_over{i} (w_i * X_i) }/(exp{ sum_over{i} (w_i * X_i) }+1)
Adapting CRF for Image Segmentation
We assume,
a random field Y defined over a set of variables {Y1, Y2, ..... Yn} with domain of each variable a set of labels L = {L1, L2, ..... Lk}.
another random field X defined over variables {X1, X2,.....Xn}.
Y denote possible pixel labels and X denotes color vector of pixels of the input image
A Conditional Random Field (X, Y) is characterized by Gibbs distribution:
factor Phi of a clique c = exp{- phi_c(Dc)} (why exponentiate?)
~P(X,Y) = product of Cg cliques factors = exp{sum_over{c belonging to Cg}(phi_c(Dc))}
P(Y|X) = (1/Z(X)) * (exp{ - sum_over{c belonging to Cg}(phi_c(Dc))})
where,
graph(g) = (V,E) is a graph on Y with each clique c in a set of cliques Cg in graph(g) inducing a potential phi_c
Z(X) = sum_over{Y} ( ~P(X, Y) )
Term inside the exponentiation is what is known as gibbs energy.
Negation of this term gibbs energy is which we later modify for our use case.
A fully connected pairwise CRF model, graph(g) is the complete graph on Y and Cg is the set of all unary and pairwise cliques.
The corresponding Gibbs energy is
E(x) = sum_over{all unary potential}(x) + sum_over{all pairwise potential}(xi, xj)
It's up to us to select the pairwise potential model and unary potential model. It depends on the application which model works best.
In an example implementation pairwise potential function for multiclass segmentation it is of the form
of paper :
https://arxiv.org/pdf/1210.5644.pdf
The pairwise potential model is,
potts potential * ( w1*(Appearance Kernel) + w2*(Smootness Kernel))
Potts Model introduces a penalty for nearby similar pixels that are assigned different labels
The appearance kernel is inspired by the observation that nearby pixels with similar color are likely to be in the same class
The smoothness kernel penalizes small isolated regions
Things that I didn't understand till now. How does Pott Potential work (like the use of permutohedral lattice in it)?
Bilateral and Gaussian Energy both using pott potential in the implementation: https://github.com/mbickel/DenseInferenceWrapper/blob/master/denseinference/lib/libDenseCRF/densecrf.cpp
Thank You