An Integrated Implementation of
Probabilistic Graphical Models

2017

Renato Cordeiro Ferreira

Advisor: Alan Mitchel Durham

IME-USP

Probabilistic
Graphical Models

for sequence labeling

Definitions

Model:
A simplified declarative representation that encodes the relevant elements of an experiment or real situation

Probabilistic Model:
A model that is mathematically described using random variables, i.e. functions that maps events \( s \) of a given sample space \( \Omega \) to a discrete or continuous space

Example

Model:
\(N\) is the number of faces in the die (ignores color, material, etc.)

Algorithm:
Generate numbers in the interval \( [1, N] \) (simulate rolling the dice)

Probabilistic Model:
\( \Omega = \{1, \ldots, N\} \), \( P(s) = 1/N \)
\( Even: S \longrightarrow \{0,1\} \) (outcome is even)

Definitions

Multidimensional Probabilistic Model:
Probabilistic model that describes a complex problem using a set of random variables \( \mathcal{X} = \{ Y_1, \ldots, Y_m, X_1, \ldots, X_n \} \)

Joint distribution:
Probability distribution \( P(\mathcal{X}) = P(Y_1, \ldots, Y_m, X_1, \ldots, X_n) \) which can be queried to reason over the model:

- MAP assignment: \( \mathrm{MAP}(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) = \mathrm{arg\,max}_{\mathbf{y}} P(\mathbf{y}, \mathbf{x}) \)

- Posterior probability distribution: \( P(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) \)

Example

Sickness diagnosis problem:
Set of patients \( P \)

- Sick? \( S: P \rightarrow \{0, 1\} \)
- Fever? \( T: P \rightarrow \{0, 1\} \)
- Hypertension? \( B: P \rightarrow \{0, 1\} \)

Queries:
Given a patient has fever and hypertension

- Is he sick or not? \( \mathrm{MAP}(S \mid B = 1, T = 1) \)

- How likely is he sick? \( P(S = 1 \mid B = 1, T = 1) \)

output

input

input

Definitions

Probabilistic Graphical Model (PGM):
A probabilistic model that uses a graph to compactly describe the dependencies between random variables and show a factorization of their joint distribution

Bayesian Network (BN):
A PGM whose graph is a directed acyclic graph (DAG)

Markov Network (MN):
A PGM whose graph has only undirected edges

Example

Bayesian Network:
A BN for the sickness diagnosis example
(arrows indicate states that influence others)

Generative

It express a factorization in terms of
Conditional Probability Distributions (CPDs):
\( P(S, T, B) = P(S) P(T | S) P(B | S) \)

Example

Markov Network:
A MN for the sickness diagnosis example
(edges indicate that there is a connection)

This express a factorization in terms of Factor Functions \( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \) where \( C_i \) is a click of the graph: \( P(S, T, B) \propto \Phi(S) \Phi(T, S) \Phi(B, S) \Phi(T) \Phi(B) \)

Generative

Example

Conditional Random Field:
A MN that ignores the clicks related only to input variables (to avoid their complexity)

This express a factorization in terms of Factor Functions \( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \) where \( C_i \) is a click of the graph: \( P(S | T, B) \propto \Phi(S) \Phi(T, S) \Phi(B, S) \)

Discriminative

Definitions

Generative model:
A probabilistic model that factorizes \( P(\mathbf{Y}, \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)

Discriminative model:
A probabilistic model that factorizes \( P(\mathbf{Y} \mid \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)

Generative and discriminative equivalents:
Every generative PGM has a discriminative equivalent that is structurally equal but represent the different distributions

Generative

\( P(S | T, B) \propto \)
\( \Phi(S) \Phi(T, S) \Phi(B, S) \)

Discriminative

\( P(S, T, B) = \)
\( P(S) P(T | S) P(B | S) \)

Increases overfitting
with less data

Generalizes better
with more data

Generative vs Discriminative

Models \(P(\mathbf{X})\), dependencies
between input variables

Requires less knowledge
about the domain described

Ignores \(P(\mathbf{X})\) dependencies
between input variables

Can generate sequences to simulate the process described

Definitions

Structured prediction:
The outcome of the queries over the model represent the structure of a complex object (such as a text or an image) in opposition to a single value

Sequence labeling:
Classify the elements of a sequence in a set of categories

Sequence alignment:
Find a consensus between annotations of a sequence

Example

Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
One player asks a refund for a judge

Sequence labeling:
Given a sequence of outcomes observed by the player during the game:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
- When was each die used? (MAP assignment)
- What is the most likely die used on each turn of the game? (Posterior probability)

Example

Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
Two players ask a refund for a judge

Sequence alignment:
Given two different sequences of outcomes observed by the players:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
6 1 2 2 4 1 4 3 4 6 6 6 1 2 4 6
- Which outputs observed by the players correspond to the same turns of the game?

PGMs for sequence labeling and alignment:
Subcategories of Bayesian and Markov Networks:

Definitions

Hidden Markov Model HMM
Hidden Semi-Markov Model HSMM
Pair Hidden Markov Model PHMM
Pair Hidden Semi-Markov Model PHSMM
Stochastic Context-Free Grammar SCFG
Context-Sensitive Hidden Markov Model CsHMM
Linear Chain Conditional Random Field LCCRF
Semi-Markov Conditional Random Field Semi-CRF

BN

MN

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
   duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Sequence labeling

Sequence alignment

Generative

Discriminative

+ decouple
   duration

+ decouple
   duration

+ Multiple
subsequences

Non-local dependencies

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
   duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

+ decouple
   duration

+ decouple
   duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
   duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Generative

Discriminative

+ decouple
   duration

+ decouple
   duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
   duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

Sequence labeling

Sequence alignment

+ decouple
   duration

+ decouple
   duration

+ Multiple
subsequences

Dynamic
Bayesian Network

Conditional
Random Field

Hidden
Markov Model

+ Discrete vars.

+ Linear out. vars.

Probabilistic Graphical Model

Bayesian Network

Markov Network

+ DAG

+ Undirected edges

+ Show process through time

+ Ignore deps. between in. vars.

Naive Bayes

Logistic Regression

+ 1 out. var.

+ 1 out. var.

Hierarchy of Mathematical Assumptions

Linear-Chain
CRF

Hidden Semi-Markov Model

+ decouple
   duration

Pair Hidden
Markov Model

Pair Hidden Semi-Markov Model

Semi-Markov
CRF

+ Multiple sequences

+ Multiple sequences

Context-Sensitive Hidden Markov Model

Stochastic Context-Free Grammar

+ Memory between states

+ decouple
   duration

+ decouple
   duration

+ Multiple
subsequences

Non-local dependencies

ToPS
framework

Probabilistic Models

Model Acronym
Discrete and Idependent Distribution IID
Inhomogeneous Markov Chain IMC
Maximal Dependence Decomposition MDD
Multiple Sequential Model MSM
Periodic Inhomogeneous Markov Chain PIMC
Similarity Based Sequence Weighting SBSW
Variable Length Markov Chain VLMC
Hidden Markov Model HMM
Hidden Semi-Markov Model HSMM

PGMs

Distributions

Extensions

Model Acronym
Pair Hidden Markov Model PHMM
Pair Hidden Semi-Markov Model PHSMM
Context-Sensitive Hidden-Markov Model CsHMM
Linear Chain Conditional Random Field LCCRF
Semi-Markov Conditional Random Field Semi-CRF

Each model had minor differences that affected the framework implementation, resulting in technical debt.
This encouraged a refactoring of the framework

Vitor Onuchic

Rafael Mathias

Ígor Bonadio

Characteristics

  • ToPS implements all its probabilistic models in a single object oriented hierarchy to improve the reuse of code
  • ToPS has a specification language to declare probabilistic models in a mathematical way, instead of instantiating them with code
  • ToPS provides command line applications to execute tasks with the models

Lang

Model

Exception

App

ToPS' Components

Model

tops::model

ToPS' published hierarchy of probabilistic models

Multifaceted abstraction

Multifaceted abstraction

ToPS' published hierarchy of probabilistic models

Features

\(P(\mathbf{x})\)

Evaluate

Calculate the probability of
a sequence given a model

Generate

Draw random sequences
from a model

Train

Estimate parameters of the
model from a dataset

Serialize

Save parameters of a model
for later reuse

Calculate

Find the posterior probabilities
of an input sequence

Label

Find the MAP assignment
for an input sequence

PGMs only

New Architecture

\( P(\mathbf{x}) \)

Probabilistic Models
(COMPOSITE)

Trainer

Estimates parameters of a model from a dataset

Evaluator

Calculates the probability of a sequence given a model

Generator

Draws random sequences from the model

Serializer

Saves parameters of the model for later reuse

Calculator

Finds the posterior probabilities
of an input sequence

Labeler

Finds the MAP assignment
for an input sequence

Boss

Secretary

Secretary

Secretary

Secretary

Secretary

Secretary

Architecture: SECRETARY pattern¹

Boss

  • Has multiple behaviors

Secretary

  • Represents only one behavior
  • Has multiple secretaries
  • Represents only one boss
  • Is used indirectly by clients
  • Interacts directly with clients
  • Holds data shared among behaviors
  • Keeps all the code that implements algorithms
  • Holds data used only by the behavior they represent
  • Keeps no meaningful logic, forwarding calls to its boss

[1] R. C. Ferreira, Í. Bonadio, and A. M. Durham, “Secretary pattern: decreasing coupling while keeping reusability”,
      Proceedings of the 11th Latin-American Conference on Pattern Languages of Programming. The Hillside Group, p. 14, 2016.

Unfactored interface

 Unfactored
 implementation

Unfactored interface and implementation

ToPS' published hierarchy of probabilistic models

Model Hierarchy: Interfaces

tops::model::ProbabilisticModel

Root of the hierarchy, implements 4 secretaroes: trainer, evaluator, generator and serializer

tops::model::DecodableModel

Node of the hierarchy, descends directly from ProbabilisticModel and implements all parent's secretaries plus calculator and labeler

tops::model::ProbabilisticModelDecorator

Node of the hierarchy, descends directly from ProbabilisticModel and adds functionalities around the implementation of parent's secretaries

Architecture: CRTP

The curiously recurring template pattern  (CRTP) is an idiom in C++  in which a class X  derives from a class template instantiation using itself as template argument. [...] Some use cases for this pattern are static polymorphism and other metaprogramming techniques [...].

Wikipedia , Curiously Recurring Template Pattern

tops::model::ProbabilisticModelCRTP
tops::model::DecodableModelCRTP
tops::model::ProbabilisticModelDecoratorCRTP

Implement FACTORY METHOD's for secretaries, define virtual methods that secretaries delegate to and host code reused between subclasses

Decodable models

Distributions

Interface + CRTP

Interface + CRTP

ToPS' refactored hierarchy of probabilistic models

Distributions

Decorators

Lang

tops::lang

Specification language

Definition

Represents trained parameters for a given model

Training

Represents training parameters for a given model

To allow better experimentation, ToPS implemented a specification language to define and train probabilistic models

HMM for the Dishonest Casino

# Dishonest Casino definition

model_name = "HiddenMarkovModel"

observation_symbols = ("1", "2", "3", "4", "5", "6")

state_names = ("Fair", "Loaded" )

initial_probabilities = ("Fair": 0.5; "Loaded": 0.5)

transitions = (
  "Loaded" | "Fair"   : 0.1;
  "Fair"   | "Fair"   : 0.9;
  "Fair"   | "Loaded" : 0.1;
  "Loaded" | "Loaded" : 0.9;
)

emission_probabilities = (
  "1" | "Fair" : 0.166666666666;
  "2" | "Fair" : 0.166666666666; 
  "3" | "Fair" : 0.166666666666;
  "4" | "Fair" : 0.166666666666;
  "5" | "Fair" : 0.166666666666;
  "6" | "Fair" : 0.166666666666;
  "1" | "Loaded" : 0.5;
  "2" | "Loaded" : 0.1;
  "3" | "Loaded" : 0.1;
  "4" | "Loaded" : 0.1;
  "5" | "Loaded" : 0.1;
  "6" | "Loaded" : 0.1;
)

How to represent CRFs?

Factor functions can be implemented as any kind of procedure of a full-featured programming language

Factor functions:

\( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \)

LCCRF for the Dishonest Casino

// -*- mode: c++ -*-
// vim: ft=chaiscript:

// Dishonest Cassino definition

model_type = "LCCRF"

observations = [ "1", "2", "3", "4", "5", "6" ]

labels = [ "Fair", "Loaded" ]

feature_function_libraries = [
  lib("features.tops")
]

feature_parameters = [
  "label_loaded": 1.0,
  "label_fair": 2.0
]

LCCRF for the Dishonest Casino

// Feature function library
observations = [ "1", "2", "3", "4", "5", "6" ]

labels = [ "Loaded", "Fair" ]

// Feature functions built directly
feature("label_loaded", fun(x, yp, yc, i) {
  if (yc == label("Loaded")) {
    return 1;
  } else {
    return 0;
  }
})

feature("label_fair", fun(x, yp, yc, i) {
  if (yc == label("Fair")) {
    return 1;
  } else {
    return 0;
  }
})

// Feature function prototypes
use("prototypes.tops")

// Feature functions built from factory
feature("Fair -> Fair"    , transition("Fair"   , "Fair"  ))
feature("Fair -> Loaded"  , transition("Fair"   , "Loaded"))
feature("Loaded -> Fair"  , transition("Loaded" , "Fair"  ))
feature("Loaded -> Loaded", transition("Loaded" , "Loaded"))

Integrated
implementation

proposal

Is it possible? I believe so!

HMM PHMM HSMM CsHMM LCCRF Semi-CRF
Emissions IID IID Any Distrib. IID Any Func. Any Func.
Durations Geometric Geometric Any Distrib. IID Geometric Any Distrib.
Transitions IID IID IID IID IID IID
Number of sequences 1 Many 1 1 1 1
Recognition power Regular Regular Regular Context-Sensitive Regular Regular
Training EM / ML EM / ML ML EM / ML DG / L-GFBS DG / L-GFBS

There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS

The integrated implementation requires the single abstraction to support the most generic characteristic of all models

Is it possible? I believe so!

HMM PHMM HSMM CsHMM LCCRF Semi-CRF
Emissions IID IID Any Distrib. IID Any Func. Any Func.
Durations Geometric Geometric Any Distrib. IID Geometric Any Distrib.
Transitions IID IID IID IID IID IID
Number of sequences 1 Many 1 1 1 1
Recognition power Regular Regular Regular Context-Sensitive Regular Regular
Training EM / ML EM / ML ML EM / ML DG / L-GFBS DG / L-GFBS

There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS

Algorithms for querying have to be generalized

Algorithms for training  limit the capabilities of the abstraction

Integration: Challenges

Model Main difference from HMM's implementation Main challenge to make the integration
HSMMs Non-geometric duration for states Allow different extensions according to the training 
PHMMs Multiple sequences Accept multiple sequences in secretaries
CsHMMs Context-sensitive recognition power Allow inferior grammars with the same algorithms
CRFs Non-probabilistic parameterization Support any real-valued function in states

Taking the HMM implementation as a base, each model imposes a different challenge to make the integration

Integration: Work Plan

Step Task Applications
1 Refactor PHMM to the new version of ToPS Use the implementation to test new techniques to improve PHMMs' alignments and publish a paper
2 Join the HMM, HSMM and PHMM in an integrated PGM class
3 Refactor CsHMM to the new version of ToPS Create a model based on CsHMMs to describe pri-miRNAs
4 Join LCCRF and Semi-CRF in the integrated PGM class
6 Support the creation of PGMs with the specification language Make the new version to work with all existing ToPS' applications

Timetable

Aug Sep Oct Nov Dec Jan Feb Mar Apr
1 X
2 X X X
3 X X X X X X
4 X X X
5 X X X

An Integrated Implementation of
Probabilistic Graphical Models

2017

Renato Cordeiro Ferreira

Advisor: Alan Mitchel Durham

IME-USP

The gene prediction problem

Gene predictors required changes in their source code in order to improve their inner probabilistic models

Problem: Automatically finding which regions of a DNA sequence are genes

Solution: Gene predictors, which use PGMs (HSMMs) to solve this domain-specific sequence labeling problem

Creating a gene predictor

Uses ToPS to build a model
that makes gene prediction

C++ framework

Implements probabilistic models for sequence labeling

Contains knowledge
about gene prediction

Can be used for any kind
of application domain

Perl system

Unfactored interface

 Unfactored
 implementation

Unfactored interface and implementation

Multifaceted abstraction

Multifaceted abstraction

ToPS' published hierarchy of probabilistic models

[M.Sc. Qualifying Exam] An Integrated Implementation of Probabilistic Graphical Models

By Renato Cordeiro Ferreira

[M.Sc. Qualifying Exam] An Integrated Implementation of Probabilistic Graphical Models

Presentation for my Master's qualifying exam at IME-USP

  • 734