Part I: Intrinsically Motivated Collective Motion - rich emergent collective dynamics from a simple, general decision making principle.
Part II: Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep RL
"Intrinsically Motivated Collective Motion", PNAS 116 (31), 2019
Work carried out in 2017-2018 during PhD in Centre for Complexity Science, University of Warwick
Could be useful in explaining/understanding
certain animal behaviours.
- "Causal Entropic Forces" (Approach based on thermodynamics)
- "Empowerment" (Information theoretic approach)
[1] "Empowerment: A Universal Agent-Centric Measure of Control" - Klyubin et al. 2005
[2] "Causal Entropic Forces" - Wissner-Gross and Freer, 2013
Deterministic environment implies:
Where is the set of states that can be reached at time t + n
https://sites.google.com/view/skew-fit
set diverse goals
Make sure you can actually
achieve the goal from each state
Maximise coverage of state space
Be able to control where the policy
goes by giving it a goal
"Intrinsically Motivated Collective Motion"
"Intrinsically Motivated Collective Motion" - PNAS 2019 - https://www.pnas.org/content/116/31/15362 (Charlesworth and Turner)
T. Vicsek et al., Phys. Rev. Lett. 75, 1226 (1995).
R
Order Parameter:
Real starling data (Cavagna et al. 2010)
Data from model
correlation function:
velocity fluctuation
"Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep Reinforcement Learning", Charlesworth & Montana, ICML (2020)
Project website: https://dexterous-manipulation.github.io
Requires considerably more sophisticated manipulators than the typical "parallel jaw grippers" most robots in industry use today.
Natural to build robotic hands that try and mimic the human hand and train them to perform complex manipulation tasks.
Traditional robotics methods struggle. Motivates RL/ gradient-free trajectory optimisation.
Despite some successes, dexterous manipulation remains a significant challenge for RL (and other methods).
1. Introduced "Dexterous Gym" - a suite of challenging extensions to the Gym manipulation environments.
2. Develop a gradient-free trajectory optimisation algorithm that can solve many of these tasks significantly more reliably than existing methods.
3. Demonstrated that the most challenging task, "Pen Spin", could be "solved" by combining examples generated by the trajectory optimisation method with off-policy deep RL.
TOPDM only
TD3 + TOPDM demos
branch \(\alpha\)
For each initial move, \( \alpha \), define a weight as follows:
Can we do this without a full search of future
states?
previous visual sensor input
current visual sensor input
hidden layers of neurons
output: predicted probability of action
non-linear activation function: f(Aw+b)
Making the swarm turn
Guiding the swarm to
follow a trajectory