Active Inference with Semi-markov models
Dimitrije Marković
Theoretical Neurobiology Meeting
01.03.2020
Active inference and semi-markov decision processes
-
(Part I) Active inference in multi-armed bandits
- Empirical comparison with UCB and Thompson sampling.
- https://slides.com/dimarkov/ai-mbs
-
(Part II) Active inference and semi-Markov processes
- Hidden semi-Markov models and a representation of state duration.
- Learning the hidden temporal structure of state transitions.
- Application: Reversal learning task.
-
(Part III) Active inference and semi-Markov decision processes
- Extending policies with action (policy) duration.
- Decision about when actions should be taken and for how long.
- Applications: Temporal attention, intertemporal choices.
Active inference and semi-markov decision processes
-
(Part I) Active inference in multi-armed bandits
- Empirical comparison with UCB and Thompson sampling.
- https://slides.com/dimarkov/ai-mbs
-
(Part II) Active inference and semi-Markov processes
- Hidden semi-Markov models and a representation of state duration.
- Learning the hidden temporal structure of state transitions.
- Application: Reversal learning task.
-
(Part III) Active inference and semi-Markov decision processes
- Extending policies with action (policy) duration.
- Decision about when actions should be taken and for how long.
- Applications: Temporal attention, intertemporal choices.
-
(Part II) Active inference and semi-Markov processes
- Hidden semi-Markov models and a representation of state duration.
- Learning the hidden temporal structure of state transitions.
- Application: Reversal learning task.
Semi-Markov processes
https://en.wikipedia.org/wiki/Markov_renewal_process#Relation_to_other_stochastic_processes
- State space S
- Jump times Tn and states Xn
- Inter-arrival time τn=Tn−Tn−1
- The sequence [(X0,T0),…,(Xn,Tn),…] is called a Markov renewal process if:
If Yt≡Xn for t∈[Tn,Tn+1) then the process Yt is called a semi-Markov process
Semi-Markov processes
State space S
…
…
Time
Special cases
For exponentially distributed iid waiting times we have a continuous time Markov chain
A discrete time Markov chain has geometrically distributed waiting times
Hidden Semi-Markov models
Shun-Cheng Yu, "Hidden semi-Markov Models: Theory, Algorithms and Applications", Elsevir 2016.
A graphical representation of HSMM
Latent variables
outcomes
Example
f∈{1,2,3}
time step
s∈{A,B}
Example
f∈{1,2,3}
time step
s∈{A,B}
Phase transitions
p(ft∣ft−1)
M Varmazyar, et al., Journal of Industrial Engineering International (2019).
Discrete phase-type distribution
…
Phase transitions
p(ft∣ft−1)
M Varmazyar, et al., Journal of Industrial Engineering International (2019).
Discrete phase-type distribution
Duration distribution
Negative binomial
p(τ)=(τ−1τ+n−2)(1−δ)τ−1δn
…
Phase transitions
p(ft∣ft−1)
State transitions
State transitions
p(st∣st−1,ft−1)
A
B
A
B
…
negative binomial distribution
Active Inference

Belief updating and learning
History of past outcomes Ot=(o1,…,ot)
Marginal likelihood
Predictive prior
Action selection
When simulating behaviour γ→∞
For data analysis γ is a free parameter
Probabilistic reversal learning
Probabilistic reversal learning
Probabilistic reversal learning
Model parameters
f∈{1,…,nmax}
⊗
⊗
loss
gain
cue A
cue B
P(ot1)=[31,31,31]
P(ot2)=[ρ1,ρ2,2ρ,2ρ]
Performance
Trials Until correct (TUC)
In SILICO
Is the experimental setup useful?
- Can the temporal structure be learned?
- Will different temporal beliefs reflect different behaviour?
- How well can we differentiate between agents with different temporal beliefs?
In SILICO
Process:
- Fix priors, and action precision γ
- Simulate behaviour in both conditions with different nmax
- Illustrate, performance, TUC, and the learning of the latent temporal structure (μ,n)
- Model inversion, and confusion matrix for nmax
In SILICO - Behavioural metrics
In SILICO - Learning latent temporal structure

In SILICO - Confusion matrix
In SILICO - Confusion matrix
Behavioural data
- 50 healthy volunteers (20-30 years old):
- 27 subjects in the condition with regular reversals
- 23 subjects in the condition with irregular reversals
- 40 trials long training with a single reversal
Performance-tuc trajectories
Model comparison
Labeled trajectories
conclusion
- Modelling and assessing influence of temporal-expectations on decision-making in dynamic environments.
- How people learn temporal expectations could also be addressed with this approach, but some challenges remain.
- Linking the underlying representation of the temporal structure to behaviour provides a novel method for computational cognitive phenotyping.
Thanks to:
- Andrea Reiter
- Stefan Kiebel
- Thomas Parr
- Karl Friston
https://slides.com/dimarkov/active-inference-semi-markov
https://github.com/dimarkov/pybefit
https://journals.plos.org/ploscompbiol/article?rev=2&id=10.1371/journal.pcbi.1006707
Active inference with semi-Markov models
By dimarkov
Active inference with semi-Markov models
- 107