Renato Cordeiro Ferreira
Scientific Programmer @ JADS | PhD Candidate @ USP | Co-founder & Coordinator @CodeLab
An Integrated Implementation of
Probabilistic Graphical Models
2017
Renato Cordeiro Ferreira
Advisor: Alan Mitchel Durham
IME-USP
Probabilistic
Graphical Models
for sequence labeling
Definitions
Model:
A simplified declarative representation that encodes the relevant elements of an experiment or real situation
Probabilistic Model:
A model that is mathematically described using random variables, i.e. functions that maps events \( s \) of a given sample space \( \Omega \) to a discrete or continuous space
Example
Model:
\(N\) is the number of faces in the die (ignores color, material, etc.)
Algorithm:
Generate numbers in the interval \( [1, N] \) (simulate rolling the dice)
Probabilistic Model:
\( \Omega = \{1, \ldots, N\} \), \( P(s) = 1/N \)
\( Even: S \longrightarrow \{0,1\} \) (outcome is even)
Definitions
Multidimensional Probabilistic Model:
Probabilistic model that describes a complex problem using a set of random variables \( \mathcal{X} = \{ Y_1, \ldots, Y_m, X_1, \ldots, X_n \} \)
Joint distribution:
Probability distribution \( P(\mathcal{X}) = P(Y_1, \ldots, Y_m, X_1, \ldots, X_n) \) which can be queried to reason over the model:
- MAP assignment: \( \mathrm{MAP}(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) = \mathrm{arg\,max}_{\mathbf{y}} P(\mathbf{y}, \mathbf{x}) \)
- Posterior probability distribution: \( P(\mathbf{Y} \mid \mathbf{X} = \mathbf{x}) \)
Example
Sickness diagnosis problem:
Set of patients \( P \)
- Sick? \( S: P \rightarrow \{0, 1\} \)
- Fever? \( T: P \rightarrow \{0, 1\} \)
- Hypertension? \( B: P \rightarrow \{0, 1\} \)
Queries:
Given a patient has fever and hypertension
- Is he sick or not? \( \mathrm{MAP}(S \mid B = 1, T = 1) \)
- How likely is he sick? \( P(S = 1 \mid B = 1, T = 1) \)
output
input
input
Definitions
Probabilistic Graphical Model (PGM):
A probabilistic model that uses a graph to compactly describe the dependencies between random variables and show a factorization of their joint distribution
Bayesian Network (BN):
A PGM whose graph is a directed acyclic graph (DAG)
Markov Network (MN):
A PGM whose graph has only undirected edges
Example
Bayesian Network:
A BN for the sickness diagnosis example
(arrows indicate states that influence others)
Generative
It express a factorization in terms of
Conditional Probability Distributions (CPDs):
\( P(S, T, B) = P(S) P(T | S) P(B | S) \)
Example
Markov Network:
A MN for the sickness diagnosis example
(edges indicate that there is a connection)
This express a factorization in terms of Factor Functions \( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \) where \( C_i \) is a click of the graph: \( P(S, T, B) \propto \Phi(S) \Phi(T, S) \Phi(B, S) \Phi(T) \Phi(B) \)
Generative
Example
Conditional Random Field:
A MN that ignores the clicks related only to input variables (to avoid their complexity)
This express a factorization in terms of Factor Functions \( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \) where \( C_i \) is a click of the graph: \( P(S | T, B) \propto \Phi(S) \Phi(T, S) \Phi(B, S) \)
Discriminative
Definitions
Generative model:
A probabilistic model that factorizes \( P(\mathbf{Y}, \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)
Discriminative model:
A probabilistic model that factorizes \( P(\mathbf{Y} \mid \mathbf{X}) \) between the output variables \( \mathbf{Y} \) and the input variables \( \mathbf{X} \)
Generative and discriminative equivalents:
Every generative PGM has a discriminative equivalent that is structurally equal but represent the different distributions
Generative
\( P(S | T, B) \propto \)
\( \Phi(S) \Phi(T, S) \Phi(B, S) \)
Discriminative
\( P(S, T, B) = \)
\( P(S) P(T | S) P(B | S) \)
Increases overfitting
with less data
Generalizes better
with more data
Generative vs Discriminative
Models \(P(\mathbf{X})\), dependencies
between input variables
Requires less knowledge
about the domain described
Ignores \(P(\mathbf{X})\) dependencies
between input variables
Can generate sequences to simulate the process described
Definitions
Structured prediction:
The outcome of the queries over the model represent the structure of a complex object (such as a text or an image) in opposition to a single value
Sequence labeling:
Classify the elements of a sequence in a set of categories
Sequence alignment:
Find a consensus between annotations of a sequence
Example
Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
One player asks a refund for a judge
Sequence labeling:
Given a sequence of outcomes observed by the player during the game:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
- When was each die used? (MAP assignment)
- What is the most likely die used on each turn of the game? (Posterior probability)
Example
Dishonest Casino with a game of dices:
- Fair: \( P(s) = 1/6 \)
- Loaded: \( P(1) = 1/2 \)
Two players ask a refund for a judge
Sequence alignment:
Given two different sequences of outcomes observed by the players:
6 1 4 4 1 2 3 4 6 6 6 2 2 4 6
6 1 2 2 4 1 4 3 4 6 6 6 1 2 4 6
- Which outputs observed by the players correspond to the same turns of the game?
PGMs for sequence labeling and alignment:
Subcategories of Bayesian and Markov Networks:
Definitions
Hidden Markov Model | HMM |
Hidden Semi-Markov Model | HSMM |
Pair Hidden Markov Model | PHMM |
Pair Hidden Semi-Markov Model | PHSMM |
Stochastic Context-Free Grammar | SCFG |
Context-Sensitive Hidden Markov Model | CsHMM |
Linear Chain Conditional Random Field | LCCRF |
Semi-Markov Conditional Random Field | Semi-CRF |
BN
MN
Dynamic
Bayesian Network
Conditional
Random Field
Hidden
Markov Model
+ Discrete vars.
+ Linear out. vars.
Probabilistic Graphical Model
Bayesian Network
Markov Network
+ DAG
+ Undirected edges
+ Show process through time
+ Ignore deps. between in. vars.
Naive Bayes
Logistic Regression
+ 1 out. var.
+ 1 out. var.
Hierarchy of Mathematical Assumptions
Linear-Chain
CRF
Hidden Semi-Markov Model
+ decouple
duration
Pair Hidden
Markov Model
Pair Hidden Semi-Markov Model
Semi-Markov
CRF
+ Multiple sequences
+ Multiple sequences
Context-Sensitive Hidden Markov Model
Stochastic Context-Free Grammar
+ Memory between states
Sequence labeling
Sequence alignment
Generative
Discriminative
+ decouple
duration
+ decouple
duration
+ Multiple
subsequences
Non-local dependencies
Dynamic
Bayesian Network
Conditional
Random Field
Hidden
Markov Model
+ Discrete vars.
+ Linear out. vars.
Probabilistic Graphical Model
Bayesian Network
Markov Network
+ DAG
+ Undirected edges
+ Show process through time
+ Ignore deps. between in. vars.
Naive Bayes
Logistic Regression
+ 1 out. var.
+ 1 out. var.
Hierarchy of Mathematical Assumptions
Linear-Chain
CRF
Hidden Semi-Markov Model
+ decouple
duration
Pair Hidden
Markov Model
Pair Hidden Semi-Markov Model
Semi-Markov
CRF
+ Multiple sequences
+ Multiple sequences
Context-Sensitive Hidden Markov Model
Stochastic Context-Free Grammar
+ Memory between states
+ decouple
duration
+ decouple
duration
+ Multiple
subsequences
Dynamic
Bayesian Network
Conditional
Random Field
Hidden
Markov Model
+ Discrete vars.
+ Linear out. vars.
Probabilistic Graphical Model
Bayesian Network
Markov Network
+ DAG
+ Undirected edges
+ Show process through time
+ Ignore deps. between in. vars.
Naive Bayes
Logistic Regression
+ 1 out. var.
+ 1 out. var.
Hierarchy of Mathematical Assumptions
Linear-Chain
CRF
Hidden Semi-Markov Model
+ decouple
duration
Pair Hidden
Markov Model
Pair Hidden Semi-Markov Model
Semi-Markov
CRF
+ Multiple sequences
+ Multiple sequences
Context-Sensitive Hidden Markov Model
Stochastic Context-Free Grammar
+ Memory between states
Generative
Discriminative
+ decouple
duration
+ decouple
duration
+ Multiple
subsequences
Dynamic
Bayesian Network
Conditional
Random Field
Hidden
Markov Model
+ Discrete vars.
+ Linear out. vars.
Probabilistic Graphical Model
Bayesian Network
Markov Network
+ DAG
+ Undirected edges
+ Show process through time
+ Ignore deps. between in. vars.
Naive Bayes
Logistic Regression
+ 1 out. var.
+ 1 out. var.
Hierarchy of Mathematical Assumptions
Linear-Chain
CRF
Hidden Semi-Markov Model
+ decouple
duration
Pair Hidden
Markov Model
Pair Hidden Semi-Markov Model
Semi-Markov
CRF
+ Multiple sequences
+ Multiple sequences
Context-Sensitive Hidden Markov Model
Stochastic Context-Free Grammar
+ Memory between states
Sequence labeling
Sequence alignment
+ decouple
duration
+ decouple
duration
+ Multiple
subsequences
Dynamic
Bayesian Network
Conditional
Random Field
Hidden
Markov Model
+ Discrete vars.
+ Linear out. vars.
Probabilistic Graphical Model
Bayesian Network
Markov Network
+ DAG
+ Undirected edges
+ Show process through time
+ Ignore deps. between in. vars.
Naive Bayes
Logistic Regression
+ 1 out. var.
+ 1 out. var.
Hierarchy of Mathematical Assumptions
Linear-Chain
CRF
Hidden Semi-Markov Model
+ decouple
duration
Pair Hidden
Markov Model
Pair Hidden Semi-Markov Model
Semi-Markov
CRF
+ Multiple sequences
+ Multiple sequences
Context-Sensitive Hidden Markov Model
Stochastic Context-Free Grammar
+ Memory between states
+ decouple
duration
+ decouple
duration
+ Multiple
subsequences
Non-local dependencies
ToPS
framework
Probabilistic Models
Model | Acronym |
---|---|
Discrete and Idependent Distribution | IID |
Inhomogeneous Markov Chain | IMC |
Maximal Dependence Decomposition | MDD |
Multiple Sequential Model | MSM |
Periodic Inhomogeneous Markov Chain | PIMC |
Similarity Based Sequence Weighting | SBSW |
Variable Length Markov Chain | VLMC |
Hidden Markov Model | HMM |
Hidden Semi-Markov Model | HSMM |
PGMs
Distributions
Extensions
Model | Acronym |
---|---|
Pair Hidden Markov Model | PHMM |
Pair Hidden Semi-Markov Model | PHSMM |
Context-Sensitive Hidden-Markov Model | CsHMM |
Linear Chain Conditional Random Field | LCCRF |
Semi-Markov Conditional Random Field | Semi-CRF |
Each model had minor differences that affected the framework implementation, resulting in technical debt.
This encouraged a refactoring of the framework
Vitor Onuchic
Rafael Mathias
Ígor Bonadio
Characteristics
Lang
Model
Exception
App
ToPS' Components
tops::model
ToPS' published hierarchy of probabilistic models
Multifaceted abstraction
Multifaceted abstraction
ToPS' published hierarchy of probabilistic models
Features
\(P(\mathbf{x})\)
Evaluate
Calculate the probability of
a sequence given a model
Generate
Draw random sequences
from a model
Train
Estimate parameters of the
model from a dataset
Serialize
Save parameters of a model
for later reuse
Calculate
Find the posterior probabilities
of an input sequence
Label
Find the MAP assignment
for an input sequence
PGMs only
New Architecture
\( P(\mathbf{x}) \)
Probabilistic Models
(COMPOSITE)
Trainer
Estimates parameters of a model from a dataset
Evaluator
Calculates the probability of a sequence given a model
Generator
Draws random sequences from the model
Serializer
Saves parameters of the model for later reuse
Calculator
Finds the posterior probabilities
of an input sequence
Labeler
Finds the MAP assignment
for an input sequence
Boss
Secretary
Secretary
Secretary
Secretary
Secretary
Secretary
Architecture: SECRETARY pattern¹
Boss
Secretary
[1] R. C. Ferreira, Í. Bonadio, and A. M. Durham, “Secretary pattern: decreasing coupling while keeping reusability”,
Proceedings of the 11th Latin-American Conference on Pattern Languages of Programming. The Hillside Group, p. 14, 2016.
Unfactored interface
Unfactored
implementation
Unfactored interface and implementation
ToPS' published hierarchy of probabilistic models
Model Hierarchy: Interfaces
tops::model::ProbabilisticModel
Root of the hierarchy, implements 4 secretaroes: trainer, evaluator, generator and serializer
tops::model::DecodableModel
Node of the hierarchy, descends directly from ProbabilisticModel and implements all parent's secretaries plus calculator and labeler
tops::model::ProbabilisticModelDecorator
Node of the hierarchy, descends directly from ProbabilisticModel and adds functionalities around the implementation of parent's secretaries
Architecture: CRTP
The curiously recurring template pattern (CRTP) is an idiom in C++ in which a class X derives from a class template instantiation using itself as template argument. [...] Some use cases for this pattern are static polymorphism and other metaprogramming techniques [...].
Wikipedia , Curiously Recurring Template Pattern
tops::model::ProbabilisticModelCRTP
tops::model::DecodableModelCRTP
tops::model::ProbabilisticModelDecoratorCRTP
Implement FACTORY METHOD's for secretaries, define virtual methods that secretaries delegate to and host code reused between subclasses
Decodable models
Distributions
Interface + CRTP
Interface + CRTP
ToPS' refactored hierarchy of probabilistic models
Distributions
Decorators
tops::lang
Specification language
Definition
Represents trained parameters for a given model
Training
Represents training parameters for a given model
To allow better experimentation, ToPS implemented a specification language to define and train probabilistic models
HMM for the Dishonest Casino
# Dishonest Casino definition
model_name = "HiddenMarkovModel"
observation_symbols = ("1", "2", "3", "4", "5", "6")
state_names = ("Fair", "Loaded" )
initial_probabilities = ("Fair": 0.5; "Loaded": 0.5)
transitions = (
"Loaded" | "Fair" : 0.1;
"Fair" | "Fair" : 0.9;
"Fair" | "Loaded" : 0.1;
"Loaded" | "Loaded" : 0.9;
)
emission_probabilities = (
"1" | "Fair" : 0.166666666666;
"2" | "Fair" : 0.166666666666;
"3" | "Fair" : 0.166666666666;
"4" | "Fair" : 0.166666666666;
"5" | "Fair" : 0.166666666666;
"6" | "Fair" : 0.166666666666;
"1" | "Loaded" : 0.5;
"2" | "Loaded" : 0.1;
"3" | "Loaded" : 0.1;
"4" | "Loaded" : 0.1;
"5" | "Loaded" : 0.1;
"6" | "Loaded" : 0.1;
)
How to represent CRFs?
Factor functions can be implemented as any kind of procedure of a full-featured programming language
Factor functions:
\( \Phi(C_i): \mathrm{Im}(C_i) \longrightarrow \mathbb{R}^+ \)
LCCRF for the Dishonest Casino
// -*- mode: c++ -*-
// vim: ft=chaiscript:
// Dishonest Cassino definition
model_type = "LCCRF"
observations = [ "1", "2", "3", "4", "5", "6" ]
labels = [ "Fair", "Loaded" ]
feature_function_libraries = [
lib("features.tops")
]
feature_parameters = [
"label_loaded": 1.0,
"label_fair": 2.0
]
LCCRF for the Dishonest Casino
// Feature function library
observations = [ "1", "2", "3", "4", "5", "6" ]
labels = [ "Loaded", "Fair" ]
// Feature functions built directly
feature("label_loaded", fun(x, yp, yc, i) {
if (yc == label("Loaded")) {
return 1;
} else {
return 0;
}
})
feature("label_fair", fun(x, yp, yc, i) {
if (yc == label("Fair")) {
return 1;
} else {
return 0;
}
})
// Feature function prototypes
use("prototypes.tops")
// Feature functions built from factory
feature("Fair -> Fair" , transition("Fair" , "Fair" ))
feature("Fair -> Loaded" , transition("Fair" , "Loaded"))
feature("Loaded -> Fair" , transition("Loaded" , "Fair" ))
feature("Loaded -> Loaded", transition("Loaded" , "Loaded"))
Integrated
implementation
proposal
Is it possible? I believe so!
HMM | PHMM | HSMM | CsHMM | LCCRF | Semi-CRF | |
---|---|---|---|---|---|---|
Emissions | IID | IID | Any Distrib. | IID | Any Func. | Any Func. |
Durations | Geometric | Geometric | Any Distrib. | IID | Geometric | Any Distrib. |
Transitions | IID | IID | IID | IID | IID | IID |
Number of sequences | 1 | Many | 1 | 1 | 1 | 1 |
Recognition power | Regular | Regular | Regular | Context-Sensitive | Regular | Regular |
Training | EM / ML | EM / ML | ML | EM / ML | DG / L-GFBS | DG / L-GFBS |
There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS
The integrated implementation requires the single abstraction to support the most generic characteristic of all models
Is it possible? I believe so!
HMM | PHMM | HSMM | CsHMM | LCCRF | Semi-CRF | |
---|---|---|---|---|---|---|
Emissions | IID | IID | Any Distrib. | IID | Any Func. | Any Func. |
Durations | Geometric | Geometric | Any Distrib. | IID | Geometric | Any Distrib. |
Transitions | IID | IID | IID | IID | IID | IID |
Number of sequences | 1 | Many | 1 | 1 | 1 | 1 |
Recognition power | Regular | Regular | Regular | Context-Sensitive | Regular | Regular |
Training | EM / ML | EM / ML | ML | EM / ML | DG / L-GFBS | DG / L-GFBS |
There is a common set of characteristics that describe all PGMs that have already been implemented on ToPS
Algorithms for querying have to be generalized
Algorithms for training limit the capabilities of the abstraction
Integration: Challenges
Model | Main difference from HMM's implementation | Main challenge to make the integration |
HSMMs | Non-geometric duration for states | Allow different extensions according to the training |
PHMMs | Multiple sequences | Accept multiple sequences in secretaries |
CsHMMs | Context-sensitive recognition power | Allow inferior grammars with the same algorithms |
CRFs | Non-probabilistic parameterization | Support any real-valued function in states |
Taking the HMM implementation as a base, each model imposes a different challenge to make the integration
Integration: Work Plan
Step | Task | Applications |
---|---|---|
1 | Refactor PHMM to the new version of ToPS | Use the implementation to test new techniques to improve PHMMs' alignments and publish a paper |
2 | Join the HMM, HSMM and PHMM in an integrated PGM class | |
3 | Refactor CsHMM to the new version of ToPS | Create a model based on CsHMMs to describe pri-miRNAs |
4 | Join LCCRF and Semi-CRF in the integrated PGM class | |
6 | Support the creation of PGMs with the specification language | Make the new version to work with all existing ToPS' applications |
Timetable
Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar | Apr | |
---|---|---|---|---|---|---|---|---|---|
1 | X | ||||||||
2 | X | X | X | ||||||
3 | X | X | X | X | X | X | |||
4 | X | X | X | ||||||
5 | X | X | X |
An Integrated Implementation of
Probabilistic Graphical Models
2017
Renato Cordeiro Ferreira
Advisor: Alan Mitchel Durham
IME-USP
The gene prediction problem
Gene predictors required changes in their source code in order to improve their inner probabilistic models
Problem: Automatically finding which regions of a DNA sequence are genes
Solution: Gene predictors, which use PGMs (HSMMs) to solve this domain-specific sequence labeling problem
Creating a gene predictor
Uses ToPS to build a model
that makes gene prediction
C++ framework
Implements probabilistic models for sequence labeling
Contains knowledge
about gene prediction
Can be used for any kind
of application domain
Perl system
Unfactored interface
Unfactored
implementation
Unfactored interface and implementation
Multifaceted abstraction
Multifaceted abstraction
ToPS' published hierarchy of probabilistic models
By Renato Cordeiro Ferreira
Presentation for my Master's qualifying exam at IME-USP
Scientific Programmer @ JADS | PhD Candidate @ USP | Co-founder & Coordinator @CodeLab