Henry Charlesworth (H.Charlesworth.1@warwick.ac.uk)
Future State Maximisation Workshop, University of Graz, 17th Dec 2019
2. Introduction to the empowerment framework
- differences with causal entropic forces
- why I prefer it!
- relation to "intrinsic motivations" in psychology
3. Ideas related to FSM in the reinforcement learning literature
4. My PhD research - FSM applied to collective motion
5. Conclusions
Could be useful in explaining/understanding
certain animal behaviours.
- "Causal Entropic Forces" (Approach based on thermodynamics)
- "Empowerment" (Information theoretic approach)
[1] "Empowerment: A Universal Agent-Centric Measure of Control" - Klyubin et al. 2005
[2] "Causal Entropic Forces" - Wissner-Gross and Freer, 2013
defined entirely in terms of the agent's "perception-action" loop
(noisy TV problem - originally used as an example of where prediction based novelty algorithms fail - also applicable to CEF)
Standard reinforcement learning paradigm: Markov decision processes
\(s_0, a_1, r_1, s_1, a_2, r_2, \dots\)
states
actions
rewards
Learn a policy \(\pi(a | s) \) to maximise expected return:
Goal-conditioned RL: learn a policy conditioned on a goal g:
Consider the case where the set of goals is the same as the set of states the system can be in, and take the reward function to be e.g. the distance between the state s and the goal g, or an indicator function for if the goal has been achieved.
https://sites.google.com/view/skew-fit
set diverse goals
Make sure you can actually
achieve the goal from each state
Maximise coverage of state space
Be able to control where the policy
goes by giving it a goal
"Intrinsically Motivated Collective Motion"
"Intrinsically Motivated Collective Motion" - PNAS 2019 - https://www.pnas.org/content/116/31/15362 (Charlesworth and Turner)
Applications
T. Vicsek et al., Phys. Rev. Lett. 75, 1226 (1995).
R
Order Parameter:
Real starling data (Cavagna et al. 2010)
Data from model
correlation function:
velocity fluctuation
branch \(\alpha\)
For each initial move, \( \alpha \), define a weight as follows:
Can we do this without a full search of future
states?
previous visual sensor input
current visual sensor input
hidden layers of neurons
output: predicted probability of action
non-linear activation function: f(Aw+b)
Making the swarm turn
Guiding the swarm to
follow a trajectory