Martin Biehl
Originally from psychology e.g. (Ryan and Deci, 2000):
activity for its inherent satisfaction rather than separable consequence
for the fun or challenge entailed rather than because of external products, pressures or reward
Examples (Oudeyer, 2008):
But can always argue:
Working definition compatible with Oudeyer (2008):
Motivation is intrinsic if its formulation is:
This includes the approach by Schmidhuber (2010):
Motivation is intrinsic if it
Another important but not defining feature is
The motivation should not vanish until the capacities of the agent are exhausted.
Applications of intrinsic motivations:
Developmental robotics:
AGI:
Sparse reward reinforcement learning:
Advantages of intrinsic motivations
Disadvantage:
Examples:
Examples:
Examples:
Examples:
Examples:
Examples:
dark room problem
Examples:
Similar to reinforcement learning for POMDPs :
But we assume no extrinsic reward
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\) : initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
\(E\) : Environment state
\(S\) : Sensor state
\(A\) : Action
\(M\) : Agent memory state
\(\newcommand{\p}{\text{p}} \p(e_0)\): initial distribution
\(\newcommand{\p}{\text{p}}\p(s|e)\) : sensor dynamics
\(\newcommand{\p}{\text{p}}\p(m'|s,a,m)\) : memory dynamics
\(\newcommand{\p}{\text{p}}\p(a|m)\) : action generation
\(\newcommand{\p}{\text{p}}\p(e'|a',e)\) : environment dynamics
Joint distribution until final time \(t=T\) :
Assumptions :
Assumptions :
Only missing:
2. Action generation
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
Model split up into three parts:
2. Generative model
2. Generative model
Possible simplifications:
2. Generative model
2. Generative model
2. Generative model
We fixed \(\Xi=\xi\) but model could be more complicated
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
As time passes in the perception action loop
3. Prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Prediction
So at \(t\) agent can plug \(m_t=sa_{\prec t}\) into model
3. Prediction
Allows to predict consequences of future actions \(\blue{\hat{a}_{t:\hat{T}}}\):
3. Prediction
predicts consequences of \(\blue{\hat{a}_{t:\hat{T}}}\) for relations between:
3. Prediction
allows agent to choose actions that lead to semantic free (information theoretical) relations between those
3. Prediction
predicts consequences of \(\blue{\hat{a}_{t:\hat{T}}}\) for :
3. Prediction
predicts consequences of \(\blue{\hat{a}_{t:\hat{T}}}\) for :
3. Prediction
Call \(\text{q}(\hat{s}_{t:\hat{T}},\hat{e}_{0:\hat{T}},\theta|\hat{a}_{t:\hat{T}},sa_{\prec t},\xi)\) the complete posterior.
3. Prediction
Note: can use same model to predict consequences of closed loop policies \(\blue{\hat{\Pi}}\)
3. Prediction
Note: can use same model to predict consequences of closed loop policies \(\blue{\hat{\Pi}}\)
3. Prediction
Note: can use same model to predict consequences of closed loop policies \(\blue{\hat{\Pi}}\)
4. Action selection
General reinforcement learning (RL) evaluates actions by expected cumulative reward \(Q(\hat{a}_{t:\hat{T}},sa_{\prec t}):=\mathbb{E}[R|\hat{a}_{t:\hat{T}},sa_{\prec t}]\):
Standard RL : reward \(r_t\) is one of the sensor values so $$r(\hat{s}\hat{a}_{t:\tau},sa_{\prec t})=r(s_\tau)=r_\tau$$
For the case of evaluating policies \(Q(\pi,sa_{\prec t}):=\mathbb{E}[R|\pi,sa_{\prec t}]\):
4. Action selection
Find best sequence:
Select / perform its first action:
4. Action selection
1. Free energy minimization
Actions should lead to environment states expected to have precise sensor values.
Get \(\text{q}(\hat{e}_{t:\hat{T}}|\hat{a}_{t:\hat{T}})\) frome the complete posterior:
1. Free energy minimization
2. Predictive information maximization
Actions should lead to the most complex sensor stream:
2. Predictive information maximization
Georg Martius, Ralf Der
2. Predictive information maximization
3. Knowledge seeking
Actions should lead to sensor values that tell the most about model parameters \(\Theta\):
3. Knowledge seeking
4. Empowerment maximization
Actions should lead to control over as many future experiences as possible:
4. Empowerment maximization
1. Free energy minimization
Intuition:
References:
Aslanides, J., Leike, J., and Hutter, M. (2017). Universal Reinforcement Learning Algorithms: Survey and Experiments. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, pages 1403–1410.
Ay, N., Bertschinger, N., Der, R., Güttler, F., and Olbrich, E. (2008). Predictive Information and Explorative Behavior of Autonomous Robots. The European Physical Journal B-Condensed Matter and Complex Systems, 63(3):329–339.
Friston, K. J., Parr, T., and de Vries, B. (2017). The Graphical Brain: Belief Propagation and Active Inference. Network Neuroscience, 1(4):381–414.
Klyubin, A., Polani, D., and Nehaniv, C. (2005). Empowerment: A Universal Agent-Centric Measure of Control. In The 2005 IEEE Congress on Evolutionary Computation, 2005, volume 1, pages 128–135.
Orseau, L., Lattimore, T., and Hutter, M. (2013). Universal Knowledge-Seeking Agents for Stochastic Environments. In Jain, S., Munos, R., Stephan, F., and Zeugmann, T., editors, Algorithmic Learning Theory, number 8139 in Lecture Notes in Computer Science, pages 158–172. Springer Berlin Heidelberg.
Storck, J., Hochreiter, S., and Schmidhuber, J. (1995). Reinforcement Driven Information Acquisition in Non-Deterministic Environments. In Proceedings of the International Conference on Artificial Neural Networks, volume 2, pages 159–164.