Martin Biehl
Similar to reinforcement learning for POMDPs :
But we assume no reward
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\) : initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
Assumptions :
Left to be specified:
2. Generative model
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
3. Inference / prediction
As time passes in the perception action loop
As time passes in the perception action loop
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
predicts consequences of assumed actions \(\blue{\hat{a}_{t:\hat{T}}}\) for relations between:
3. Inference / prediction
Call \(\text{q}(\hat{s}_{t:\hat{T}},\hat{e}_{0:\hat{T}},\theta|\hat{a}_{t:\hat{T}},sa_{\prec t},\xi)\) the complete posteriors.
3. Inference / prediction
4. Action selection
4. Action selection
Then either:
The free energy that active inference suggests to minimize is:
Intrinsic motivations (roughly):
In reinforcement learning (RL) the motivation is given by the expected sum over all values of one particular sensor, the "reward sensor" \(s^r\).
0. Reinforcement learning
1. Conditional entropy maximization
Actions should lead to environment states expected to have precise sensor values (e.g. Friston, Parr et al., 2017):
Get \(\text{q}(\hat{e}_{t:\hat{T}}|\hat{a}_{t:\hat{T}})\) frome the complete posterior:
1. Conditional entropy maximization
2. Predictive information maximization
Actions should lead to the most complex sensor stream:
2. Predictive information maximization
2. Predictive information maximization
Georg Martius, Ralf Der
3. Knowledge seeking
Actions should lead to sensor values that tell the most about hidden (environment) variables \(\hat{E}_{0:\hat{T}}\) and model parameters \(\Theta\):
3. Knowledge seeking
3. Knowledge seeking
Bellemare et al. (2016)
Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., and Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. arXiv:1606.01868 [cs]. arXiv: 1606.01868.
4. Empowerment maximization
Actions should lead to control over as many future experiences as possible:
4. Empowerment maximization
4. Empowerment
Guckelsberger et al. (2016)
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
5a. Curiosity
Actions should lead to surprising sensor values.
5a. Curiosity
5b. Curiosity
Actions should lead to surprising environment states.
5b. Curiosity
Actions should lead to surprising embedding of sensor values:
5. Curiosity
Burda et al. (2018) with permission.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. arXiv:1808.04355 [cs, stat]. arXiv: 1808.04355.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E. (2008). Predictive Information and Explorative Behavior of Autonomous Robots. The European Physical Journal B-Condensed Matter and Complex Systems, 63(3):329–339.
Biehl, M., Guckelsberger, C., Salge, C., Smith, S. C., and Polani, D. (2018). Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop. Frontiers in Neurorobotics, 12.
Friston, K., FitzGerald, T., Rigoli, F., Schwartenbeck, P., O’Doherty, J., and Pezzulo, G. (2016). Active Inference and Learning. Neuroscience & Biobehavioral Reviews, 68(Supplement C):862–879.
Friston, K. J., Parr, T., and de Vries, B. (2017). The Graphical Brain: Belief Propagation and Active Inference. Network Neuroscience, 1(4):381–414.
Klyubin, A., Polani, D., and Nehaniv, C. (2005). Empowerment: A Universal Agent-Centric Measure of Control. In The 2005 IEEE Congress on Evolutionary Computation, 2005, volume 1, pages 128–135.
Orseau, L., Lattimore, T., and Hutter, M. (2013). Universal Knowledge-Seeking Agents for Stochastic Environments. In Jain, S., Munos, R., Stephan, F., and Zeugmann, T., editors, Algorithmic Learning Theory, number 8139 in Lecture Notes in Computer Science, pages 158–172. Springer Berlin Heidelberg.
Oudeyer, P.-Y. and Kaplan, F. (2008). How can we define intrinsic motivation? In Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Lund: LUCS, Brighton. Lund University Cognitive Studies, Lund: LUCS, Brighton.
Schmidhuber, J. (2010). Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247.
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
Lindley, D. V. (1956). On a Measure of the Information Provided by an Experiment. The Annals of Mathematical Statistics, 27(4):986–1005.