An Integrated Implementation of
Probabilistic Graphical Models

2017

Renato Cordeiro Ferreira

Advisor: Alan Mitchel Durham

IME-USP

Probabilistic
Graphical Models
for sequence labeling

Definitions

Model:
A simplified declarative representation that encodes the relevant elements of an experiment or real situation

Probabilistic Model:
A model that is mathematically described using random variables, i.e. functions that maps events \( s \) of a given sample space \( \Omega \) to a discrete or continuous space

Example

Model:
\(N\) is the number of faces in the die (ignores color, material, etc.)

Algorithm:
Generate numbers in the interval \( [1, N] \) (simulate rolling the dice)

Probabilistic Model:
\( \Omega = \{1, \ldots, N\} \), \( P(s) = 1/N \)
\( Even: S \longrightarrow \{0,1\} \) (outcome is even)

Definitions

Multidimensional Probabilistic Model:
Probabilistic model that describes a complex problem using a set of random variables \( \mathcal{X} = \{ Y_1, \ldots, Y_m, X_1, \ldots, X_n \} \)

Joint distribution:
Probability distribution \( P(\mathcal{X}) = P(Y_1, \ldots, Y_m, X_1, \ldots, X_n) \) which can be queried to reason over the model:

- MAP assignment: \( \mathrm{MAP}(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) = \mathrm{arg\,max}_{\mathbf{y}} P(\mathbf{y}, \mathbf{x}) \)

- Posterior probability distribution: \( P(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) \)

Example

Sickness diagnosis problem:
Set of patients \( P \)

- Sick? \( S: P \rightarrow \{0, 1\} \)
- Fever? \( T: P \rightarrow \{0, 1\} \)
- Hypertension? \( B: P \rightarrow \{0, 1\} \)

Queries:
Given a patient has fever and hypertension

- Is he sick or not? \( \mathrm{MAP}(S \mid B = 1, T = 1) \)

- How likely is he sick? \( P(S = 1 \mid B = 1, T = 1) \)

output

input

Definitions

Probabilistic Graphical Model (PGM):
A probabilistic model that uses a graph to compactly describe the dependencies between random variables and show a factorization of their joint distribution

Bayesian Network (BN):
A PGM whose graph is a directed acyclic graph (DAG)

Markov Network (MN):
A PGM whose graph has only undirected edges

Example

Bayesian Network:
A BN for the sickness diagnosis example
(arrows indicate states that influence others)

Generative

It express a factorization in terms of
Conditional Probability Distributions (CPDs):
\( P(S, T, B) = P(S) P(T | S) P(B | S) \)

Example

Markov Network:
A MN for the sickness diagnosis example
(edges indicate that there is a connection)

This express a factorization in terms of Factor Functions \( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \) where \( C_i \) is a click of the graph: \( P(S, T, B) \propto \Phi(S) \Phi(T, S) \Phi(B, S) \Phi(T) \Phi(B) \)

Generative

Example

Conditional Random Field:
A MN that ignores the clicks related only to input variables (to avoid their complexity)

Discriminative

Definitions

Generative model:
A probabilistic model that factorizes \( P(\mathbf{Y}, \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)

Discriminative model:
A probabilistic model that factorizes \( P(\mathbf{Y} \mid \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)

Generative and discriminative equivalents:
Every generative PGM has a discriminative equivalent that is structurally equal but represent the different distributions

Generative

\( P(S | T, B) \propto \)
\( \Phi(S) \Phi(T, S) \Phi(B, S) \)

Discriminative

\( P(S, T, B) = \)
\( P(S) P(T | S) P(B | S) \)

Increases overfitting
with less data

Generalizes better
with more data

Generative vs Discriminative

Models \(P(\mathbf{X})\), dependencies
between input variables

Requires less knowledge
about the domain described

Ignores \(P(\mathbf{X})\) dependencies
between input variables

Can generate sequences to simulate the process described

Definitions

Structured prediction:
The outcome of the queries over the model represent the structure of a complex object (such as a text or an image) in opposition to a single value

Sequence labeling:
Classify the elements of a sequence in a set of categories

Sequence alignment:
Find a consensus between annotations of a sequence

Example

Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
One player asks a refund for a judge

Sequence labeling:
Given a sequence of outcomes observed by the player during the game:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
- When was each die used? (MAP assignment)
- What is the most likely die used on each turn of the game? (Posterior probability)

Example

Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
Two players ask a refund for a judge

Sequence alignment:
Given two different sequences of outcomes observed by the players:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
6 1 2 2 4 1 4 3 4 6 6 6 1 2 4 6
- Which outputs observed by the players correspond to the same turns of the game?

PGMs for sequence labeling and alignment:
Subcategories of Bayesian and Markov Networks:

Definitions

Hidden Markov Model	HMM
Hidden Semi-Markov Model	HSMM
Pair Hidden Markov Model	PHMM
Pair Hidden Semi-Markov Model	PHSMM
Stochastic Context-Free Grammar	SCFG
Context-Sensitive Hidden Markov Model	CsHMM
Linear Chain Conditional Random Field	LCCRF
Semi-Markov Conditional Random Field	Semi-CRF

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Sequence labeling

Sequence alignment

Generative

Discriminative

+ decouple
duration

+ Multiple
subsequences

Non-local dependencies

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

+ decouple
duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Generative

Discriminative

+ decouple
duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Sequence labeling

Sequence alignment

+ decouple
duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

+ decouple
duration

+ Multiple
subsequences

Non-local dependencies

ToPS
framework

Probabilistic Models

Model	Acronym
Discrete and Idependent Distribution	IID
Inhomogeneous Markov Chain	IMC
Maximal Dependence Decomposition	MDD
Multiple Sequential Model	MSM
Periodic Inhomogeneous Markov Chain	PIMC
Similarity Based Sequence Weighting	SBSW
Variable Length Markov Chain	VLMC
Hidden Markov Model	HMM
Hidden Semi-Markov Model	HSMM

PGMs

Distributions

Extensions

Model	Acronym
Pair Hidden Markov Model	PHMM
Pair Hidden Semi-Markov Model	PHSMM
Context-Sensitive Hidden-Markov Model	CsHMM
Linear Chain Conditional Random Field	LCCRF
Semi-Markov Conditional Random Field	Semi-CRF

Each model had minor differences that affected the framework implementation, resulting in technical debt.
This encouraged a refactoring of the framework

Vitor Onuchic

Rafael Mathias

Ígor Bonadio

Characteristics

ToPS implements all its probabilistic models in a single object oriented hierarchy to improve the reuse of code

ToPS has a specification language to declare probabilistic models in a mathematical way, instead of instantiating them with code

ToPS provides command line applications to execute tasks with the models

Lang

Model

Exception

App

ToPS' Components

Model

tops::model

ToPS' published hierarchy of probabilistic models

Multifaceted abstraction

ToPS' published hierarchy of probabilistic models

Features

\(P(\mathbf{x})\)

Evaluate

Calculate the probability of
a sequence given a model

Generate

Draw random sequences
from a model

Train

Estimate parameters of the
model from a dataset

Serialize

Save parameters of a model
for later reuse

Calculate

Find the posterior probabilities
of an input sequence

Label

Find the MAP assignment
for an input sequence

PGMs only

New Architecture

\( P(\mathbf{x}) \)

Probabilistic Models
(COMPOSITE)

Trainer

Estimates parameters of a model from a dataset

Evaluator

Calculates the probability of a sequence given a model

Generator

Draws random sequences from the model

Serializer

Saves parameters of the model for later reuse

Calculator

Finds the posterior probabilities
of an input sequence

Labeler

Finds the MAP assignment
for an input sequence

Boss

Secretary

Architecture: SECRETARY pattern¹

Boss

Has multiple behaviors

Secretary

Represents only one behavior

Has multiple secretaries

Represents only one boss

Is used indirectly by clients

Interacts directly with clients

Holds data shared among behaviors

Keeps all the code that implements algorithms

Holds data used only by the behavior they represent

Keeps no meaningful logic, forwarding calls to its boss

[1] R. C. Ferreira, Í. Bonadio, and A. M. Durham, “Secretary pattern: decreasing coupling while keeping reusability”,
Proceedings of the 11th Latin-American Conference on Pattern Languages of Programming. The Hillside Group, p. 14, 2016.

Unfactored interface

Unfactored
implementation

Unfactored interface and implementation

ToPS' published hierarchy of probabilistic models

Model Hierarchy: Interfaces

tops::model::ProbabilisticModel

Root of the hierarchy, implements 4 secretaroes: trainer, evaluator, generator and serializer

tops::model::DecodableModel

Node of the hierarchy, descends directly from ProbabilisticModel and implements all parent's secretaries plus calculator and labeler

tops::model::ProbabilisticModelDecorator

Node of the hierarchy, descends directly from ProbabilisticModel and adds functionalities around the implementation of parent's secretaries

Architecture: CRTP

The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using itself as template argument. [...] Some use cases for this pattern are static polymorphism and other metaprogramming techniques [...].

Wikipedia , Curiously Recurring Template Pattern

tops::model::ProbabilisticModelCRTP

tops::model::DecodableModelCRTP

tops::model::ProbabilisticModelDecoratorCRTP

Implement FACTORY METHOD's for secretaries, define virtual methods that secretaries delegate to and host code reused between subclasses

Decodable models

Distributions

Interface + CRTP

ToPS' refactored hierarchy of probabilistic models

Distributions

Decorators

Lang

tops::lang

Specification language

Definition

Represents trained parameters for a given model

Training

Represents training parameters for a given model

To allow better experimentation, ToPS implemented a specification language to define and train probabilistic models

HMM for the Dishonest Casino

# Dishonest Casino definition

model_name = "HiddenMarkovModel"

observation_symbols = ("1", "2", "3", "4", "5", "6")

state_names = ("Fair", "Loaded" )

initial_probabilities = ("Fair": 0.5; "Loaded": 0.5)

transitions = (
  "Loaded" | "Fair"   : 0.1;
  "Fair"   | "Fair"   : 0.9;
  "Fair"   | "Loaded" : 0.1;
  "Loaded" | "Loaded" : 0.9;
)

emission_probabilities = (
  "1" | "Fair" : 0.166666666666;
  "2" | "Fair" : 0.166666666666; 
  "3" | "Fair" : 0.166666666666;
  "4" | "Fair" : 0.166666666666;
  "5" | "Fair" : 0.166666666666;
  "6" | "Fair" : 0.166666666666;
  "1" | "Loaded" : 0.5;
  "2" | "Loaded" : 0.1;
  "3" | "Loaded" : 0.1;
  "4" | "Loaded" : 0.1;
  "5" | "Loaded" : 0.1;
  "6" | "Loaded" : 0.1;
)

How to represent CRFs?

Factor functions can be implemented as any kind of procedure of a full-featured programming language

Factor functions:

\( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \)

LCCRF for the Dishonest Casino

// -*- mode: c++ -*-
// vim: ft=chaiscript:

// Dishonest Cassino definition

model_type = "LCCRF"

observations = [ "1", "2", "3", "4", "5", "6" ]

labels = [ "Fair", "Loaded" ]

feature_function_libraries = [
  lib("features.tops")
]

feature_parameters = [
  "label_loaded": 1.0,
  "label_fair": 2.0
]

LCCRF for the Dishonest Casino

// Feature function library
observations = [ "1", "2", "3", "4", "5", "6" ]

labels = [ "Loaded", "Fair" ]

// Feature functions built directly
feature("label_loaded", fun(x, yp, yc, i) {
  if (yc == label("Loaded")) {
    return 1;
  } else {
    return 0;
  }
})

feature("label_fair", fun(x, yp, yc, i) {
  if (yc == label("Fair")) {
    return 1;
  } else {
    return 0;
  }
})

// Feature function prototypes
use("prototypes.tops")

// Feature functions built from factory
feature("Fair -> Fair"    , transition("Fair"   , "Fair"  ))
feature("Fair -> Loaded"  , transition("Fair"   , "Loaded"))
feature("Loaded -> Fair"  , transition("Loaded" , "Fair"  ))
feature("Loaded -> Loaded", transition("Loaded" , "Loaded"))

Integrated
implementation
proposal

Is it possible? I believe so!

	HMM	PHMM	HSMM	CsHMM	LCCRF	Semi-CRF
Emissions	IID	IID	Any Distrib.	IID	Any Func.	Any Func.
Durations	Geometric	Geometric	Any Distrib.	IID	Geometric	Any Distrib.
Transitions	IID	IID	IID	IID	IID	IID
Number of sequences	1	Many	1	1	1	1
Recognition power	Regular	Regular	Regular	Context-Sensitive	Regular	Regular
Training	EM / ML	EM / ML	ML	EM / ML	DG / L-GFBS	DG / L-GFBS

There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS

The integrated implementation requires the single abstraction to support the most generic characteristic of all models

Is it possible? I believe so!

	HMM	PHMM	HSMM	CsHMM	LCCRF	Semi-CRF
Emissions	IID	IID	Any Distrib.	IID	Any Func.	Any Func.
Durations	Geometric	Geometric	Any Distrib.	IID	Geometric	Any Distrib.
Transitions	IID	IID	IID	IID	IID	IID
Number of sequences	1	Many	1	1	1	1
Recognition power	Regular	Regular	Regular	Context-Sensitive	Regular	Regular
Training	EM / ML	EM / ML	ML	EM / ML	DG / L-GFBS	DG / L-GFBS

There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS

Algorithms for querying have to be generalized

Algorithms for training limit the capabilities of the abstraction

Integration: Challenges

Model	Main difference from HMM's implementation	Main challenge to make the integration
HSMMs	Non-geometric duration for states	Allow different extensions according to the training
PHMMs	Multiple sequences	Accept multiple sequences in secretaries
CsHMMs	Context-sensitive recognition power	Allow inferior grammars with the same algorithms
CRFs	Non-probabilistic parameterization	Support any real-valued function in states

Taking the HMM implementation as a base, each model imposes a different challenge to make the integration

Integration: Work Plan

Step	Task	Applications
1	Refactor PHMM to the new version of ToPS	Use the implementation to test new techniques to improve PHMMs' alignments and publish a paper
2	Join the HMM, HSMM and PHMM in an integrated PGM class
3	Refactor CsHMM to the new version of ToPS	Create a model based on CsHMMs to describe pri-miRNAs
4	Join LCCRF and Semi-CRF in the integrated PGM class
6	Support the creation of PGMs with the specification language	Make the new version to work with all existing ToPS' applications

Timetable

	Aug	Sep	Oct	Nov	Dec	Jan	Feb	Mar	Apr
1	X
2	X	X	X
3		X	X	X	X	X	X
4							X	X	X
5							X	X	X

An Integrated Implementation of
Probabilistic Graphical Models

2017

Renato Cordeiro Ferreira

Advisor: Alan Mitchel Durham

IME-USP

The gene prediction problem

Gene predictors required changes in their source code in order to improve their inner probabilistic models

Problem: Automatically finding which regions of a DNA sequence are genes

Solution: Gene predictors, which use PGMs (HSMMs) to solve this domain-specific sequence labeling problem

Creating a gene predictor

Uses ToPS to build a model
that makes gene prediction

C++ framework

Implements probabilistic models for sequence labeling

Contains knowledge
about gene prediction

Can be used for any kind
of application domain

Perl system

Unfactored interface

Unfactored
implementation

Unfactored interface and implementation

Multifaceted abstraction

ToPS' published hierarchy of probabilistic models

[M.Sc. Qualifying Exam] An Integrated Implementation of Probabilistic Graphical Models

By Renato Cordeiro Ferreira

[M.Sc. Qualifying Exam] An Integrated Implementation of Probabilistic Graphical Models

Presentation for my Master's qualifying exam at IME-USP

8 years ago
792

Renato Cordeiro Ferreira

Scientific Programmer @ JADS | PhD Candidate @ USP | Co-founder & Coordinator @CodeLab

Model

Lang

[M.Sc. Qualifying Exam] An Integrated Implementation of Probabilistic Graphical Models

More from Renato Cordeiro Ferreira