Martin Biehl
Motivation is something that generates behavior for an agent (robot, living organism)
Originally from psychology e.g. (Ryan and Deci, 2000):
activity for its inherent satisfaction rather than separable consequence
for the fun or challenge entailed rather than because of external products, pressures or reward
Examples (Oudeyer, 2008):
But can always argue:
Developmental robotics:
Working definition compatible with Oudeyer (2008):
Motivation is intrinsic if it:
This includes the approach by Schmidhuber (2010):
Motivation is intrinsic if it
Embodiment independent means it should work (without changes) for any form of agent:
and produce "worthwhile" behavior
Embodiment independent means it should work for any form of agent:
this implies
Semantic free, information theoretic:
Another important but not defining feature is usually known from evolution:
Another important but not defining feature is usually known from evolution:
open endedness
The motivation should not vanish until the capacities of the agent are exhausted.
Other applications of intrinsic motivations:
Sparse reward reinforcement learning:
AGI:
Advantages of intrinsic motivations
Disadvantage:
Examples:
Examples:
Examples:
Examples:
dark room problem
Examples:
Solution for dark room problem
Similar to reinforcement learning for POMDPs :
But we assume no extrinsic reward
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\) : initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
Joint distribution until final time \(t=T\) :
Assumptions :
Assumptions :
Only missing:
Remarks:
1. Generative model
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
3. Inference / prediction
As time passes in the perception action loop
As time passes in the perception action loop
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
predicts consequences of assumed actions \(\blue{\hat{a}_{t:\hat{T}}}\) for relations between:
3. Inference / prediction
Call \(\text{q}(\hat{s}_{t:\hat{T}},\hat{e}_{0:\hat{T}},\theta|\hat{a}_{t:\hat{T}},sa_{\prec t},\xi)\) the complete posteriors.
3. Inference / prediction
3. Prediction
Note: can use same model to predict consequences of closed loop policies \(\blue{\hat{\Pi}}\)
3. Prediction
Note: can use same model to predict consequences of closed loop policies \(\blue{\hat{\Pi}}\)
3. Inference / prediction
Complete posteriors factorizes:
3. Inference / prediction
into posterior and predictive factor:
4. Action selection
4. Action selection
General reinforcement learning (RL) evaluates actions by expected cumulative reward \(Q(\hat{a}_{t:\hat{T}},sa_{\prec t}):=\mathbb{E}[R|\hat{a}_{t:\hat{T}},sa_{\prec t}]\):
Standard RL : reward \(r_t\) is one of the sensor values so $$r(\hat{s}\hat{a}_{t:\tau},sa_{\prec t})=r(s_\tau)=r_\tau$$
For the case of evaluating policies \(Q(\pi,sa_{\prec t}):=\mathbb{E}[R|\pi,sa_{\prec t}]\):
4. Action selection
From best sequence:
Select / perform the first action:
4. Action selection
From best sequence:
Select / perform the first action:
1. Free energy minimization
Actions should lead to environment states expected to have precise sensor values (e.g. Friston, Parr et al., 2017):
Get \(\text{q}(\hat{e}_{t:\hat{T}}|\hat{a}_{t:\hat{T}})\) frome the complete posterior:
1. Free energy minimization
2. Predictive information maximization
Actions should lead to the most complex sensor stream:
2. Predictive information maximization
2. Predictive information maximization
Georg Martius, Ralf Der
3. Knowledge seeking
Actions should lead to sensor values that tell the most about hidden (environment) variables \(\hat{E}_{0:\hat{T}}\) and model parameters \(\Theta\):
3. Knowledge seeking
3. Knowledge seeking
Bellemare et al. (2016)
Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., and Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. arXiv:1606.01868 [cs]. arXiv: 1606.01868.
4. Empowerment maximization
Actions should lead to control over as many future experiences as possible:
4. Empowerment maximization
4. Empowerment
Guckelsberger et al. (2016)
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
5a. Curiosity
Actions should lead to surprising sensor values.
5a. Curiosity
5b. Curiosity
Actions should lead to surprising environment states.
5b. Curiosity
Actions should lead to surprising embedding of sensor values:
5. Curiosity
Burda et al. (2018) with permission.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. arXiv:1808.04355 [cs, stat]. arXiv: 1808.04355.
1. Variational complete posteriors
1. Variational complete posteriors
2. Variational inference
2. Variational policy
2. Variational policy
3. Active inference
3. Active inference
References:
Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 1403–1410.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E. (2008). Predictive Information and Explorative Behavior of Autonomous Robots. The European Physical Journal B-Condensed Matter and Complex Systems, 63(3):329–339.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. arXiv:1808.04355 [cs, stat]. arXiv: 1808.04355.
Friston, K. J., Parr, T., and de Vries, B. (2017). The Graphical Brain: Belief Propagation and Active Inference. Network Neuroscience, 1(4):381–414.
Klyubin, A., Polani, D., and Nehaniv, C. (2005). Empowerment: A Universal Agent-Centric Measure of Control. In The 2005 IEEE Congress on Evolutionary Computation, 2005, volume 1, pages 128–135.
Orseau, L., Lattimore, T., and Hutter, M. (2013). Universal Knowledge-Seeking Agents for Stochastic Environments. In Jain, S., Munos, R., Stephan, F., and Zeugmann, T., editors, Algorithmic Learning Theory, number 8139 in Lecture Notes in Computer Science, pages 158–172. Springer Berlin Heidelberg.
Oudeyer, P.-Y. and Kaplan, F. (2008). How can we define intrinsic motivation? In Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Lund: LUCS, Brighton. Lund University Cognitive Studies, Lund: LUCS, Brighton.
Schmidhuber, J. (2010). Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247.
Storck, J., Hochreiter, S., and Schmidhuber, J. (1995). Reinforcement Driven Information Acquisition in Non-Deterministic Environments. In Proceedings of the International Conference on Artificial Neural Networks, volume 2, pages 159–164.
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
4. Recognition model / approximate prediction
4. Recognition model / approximate prediction
4. Recognition model / approximate prediction
exact complete posteriors:
approximate complete posteriors:
into posterior and predictive factor:
4. Recognition model / approximate prediction
4. Recognition model / approximate prediction
exact complete posteriors:
approximate complete posteriors:
4. Recognition model / approximate prediction
Call \(\text{q}(\hat{s}_{t:\hat{T}},\hat{e}_{0:\hat{T}},\theta|\hat{a}_{t:\hat{T}},sa_{\prec t},\xi)\) the complete posteriors.
4. Recognition model / approximate prediction
1. Free energy minimization
Intuition:
2. Generative model
Remarks:
AGI:
4. Action selection
3. Prediction
Allows to predict consequences of future actions \(\blue{\hat{a}_{t:\hat{T}}}\):
2. Generative model
Model split up into three parts: