Martin Biehl
Motivation is something that generates behavior for an agent (robot, living organism)
Developmental robotics:
Working definition compatible with Oudeyer (2008):
Motivation is intrinsic if it:
This includes the approach by Schmidhuber (2010):
Motivation is intrinsic if it
Embodiment independent means it should work (without changes) for any form of agent:
and produce "worthwhile" behavior
Embodiment independent means it should work for any form of agent:
this implies
Semantic free, information theoretic:
Another important but not defining feature is usually known from evolution:
Another important but not defining feature is usually known from evolution:
open endedness
The motivation should not vanish until the capacities of the agent are exhausted.
Other applications of intrinsic motivations:
Sparse reward reinforcement learning:
AGI:
Advantages of intrinsic motivations
Disadvantage:
Examples:
Examples:
Examples:
Examples:
dark room problem
Examples:
Solution for dark room problem
Similar to reinforcement learning for POMDPs :
But we assume no extrinsic reward
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\) : initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
Joint distribution until final time \(t=T\) :
Assumptions :
Left to be specified:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
3. Inference / prediction
As time passes in the perception action loop
As time passes in the perception action loop
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Inference / prediction
predicts consequences of assumed actions \(\blue{\hat{a}_{t:\hat{T}}}\) for relations between:
3. Inference / prediction
Call \(\text{q}(\hat{s}_{t:\hat{T}},\hat{e}_{0:\hat{T}},\theta|\hat{a}_{t:\hat{T}},sa_{\prec t},\xi)\) the complete posteriors.
3. Inference / prediction
4. Action selection
Then either:
4. Action selection
In reinforcement learning (RL) the motivation is given by the expected sum over all values of one particular sensor, the "reward sensor" \(s^r\).
0. Reinforcement learning
1. Conditional entropy maximization
Actions should lead to environment states expected to have precise sensor values (e.g. Friston, Parr et al., 2017):
Get \(\text{q}(\hat{e}_{t:\hat{T}}|\hat{a}_{t:\hat{T}})\) frome the complete posterior:
1. Conditional entropy maximization
2. Predictive information maximization
Actions should lead to the most complex sensor stream:
2. Predictive information maximization
2. Predictive information maximization
Georg Martius, Ralf Der
3. Knowledge seeking
Actions should lead to sensor values that tell the most about hidden (environment) variables \(\hat{E}_{0:\hat{T}}\) and model parameters \(\Theta\):
3. Knowledge seeking
3. Knowledge seeking
Bellemare et al. (2016)
Bellemare, M. G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., and Munos, R. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. arXiv:1606.01868 [cs]. arXiv: 1606.01868.
4. Empowerment maximization
Actions should lead to control over as many future experiences as possible:
4. Empowerment maximization
4. Empowerment
Guckelsberger et al. (2016)
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
5a. Curiosity
Actions should lead to surprising sensor values.
5a. Curiosity
5b. Curiosity
Actions should lead to surprising environment states.
5b. Curiosity
Actions should lead to surprising embedding of sensor values:
5. Curiosity
Burda et al. (2018) with permission.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. arXiv:1808.04355 [cs, stat]. arXiv: 1808.04355.
6. Novelty
Actions should lead to sensor values that haven't been visited often before.
6. Novelty
6. Novelty
Burda, Y., Edwards, H., Storkey, A. & Klimov, O. Exploration by Random Network Distillation. arXiv:1810.12894 [cs, stat] (2018).
References:
Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 1403–1410.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E. (2008). Predictive Information and Explorative Behavior of Autonomous Robots. The European Physical Journal B-Condensed Matter and Complex Systems, 63(3):329–339.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., and Efros, A. A. (2018). Large-Scale Study of Curiosity-Driven Learning. arXiv:1808.04355 [cs, stat]. arXiv: 1808.04355.
Friston, K. J., Parr, T., and de Vries, B. (2017). The Graphical Brain: Belief Propagation and Active Inference. Network Neuroscience, 1(4):381–414.
Klyubin, A., Polani, D., and Nehaniv, C. (2005). Empowerment: A Universal Agent-Centric Measure of Control. In The 2005 IEEE Congress on Evolutionary Computation, 2005, volume 1, pages 128–135.
Orseau, L., Lattimore, T., and Hutter, M. (2013). Universal Knowledge-Seeking Agents for Stochastic Environments. In Jain, S., Munos, R., Stephan, F., and Zeugmann, T., editors, Algorithmic Learning Theory, number 8139 in Lecture Notes in Computer Science, pages 158–172. Springer Berlin Heidelberg.
Oudeyer, P.-Y. and Kaplan, F. (2008). How can we define intrinsic motivation? In Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems, Lund University Cognitive Studies, Lund: LUCS, Brighton. Lund University Cognitive Studies, Lund: LUCS, Brighton.
Schmidhuber, J. (2010). Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247.
Storck, J., Hochreiter, S., and Schmidhuber, J. (1995). Reinforcement Driven Information Acquisition in Non-Deterministic Environments. In Proceedings of the International Conference on Artificial Neural Networks, volume 2, pages 159–164.
Guckelsberger, C., Salge, C., & Colton, S. (2016). Intrinsically Motivated General Companion NPCs via Coupled Empowerment Maximisation. 2016 IEEE Conf. Computational Intelligence in Games (CIG’16), 150–157
Remarks:
AGI: