Part I: Intrinsically Motivated Collective Motion - rich emergent collective dynamics from a simple, general decision making principle.
Part II: Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep RL
"Intrinsically Motivated Collective Motion", Proceedings of the National Academy of Sciences 116 (31), 2019
Work carried out in 2016-2018 during PhD in the Centre for Complexity Science, University of Warwick
Could be useful in explaining/understanding
certain animal behaviours.
- "Causal Entropic Forces" (Approach based on thermodynamics)
- "Empowerment" (Information theoretic approach)
[1] "Empowerment: A Universal Agent-Centric Measure of Control" - Klyubin et al. 2005
[2] "Causal Entropic Forces" - Wissner-Gross and Freer, 2013
Deterministic environment implies:
Where is the set of states that can be reached at time t + n
set diverse goals
Minimise uncertainty about final state
Maximise coverage of state space
Learn how to achieve goals (minimise uncertainity about
given specified goal
which goal was specified given state reached)
"Intrinsically Motivated Collective Motion"
"Intrinsically Motivated Collective Motion" - Proceedings of the National Academy Of Sciences (PNAS), 2019 - https://www.pnas.org/content/116/31/15362 (Charlesworth and Turner)
T. Vicsek et al., Phys. Rev. Lett. 75, 1226 (1995).
R
Order Parameter:
Real starling data (Cavagna et al. 2010)
Data from model
correlation function:
velocity fluctuation
"Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep Reinforcement Learning", Charlesworth & Montana, ICML (2021)
Project website: https://dexterous-manipulation.github.io
Work carried out in 2019-2020 as a Postdoc with Professor Giovannia Montana at the University of Warwick
Requires considerably more sophisticated manipulators than the typical "parallel jaw grippers" most robots in industry use today.
Natural to build robotic hands that try and mimic the human hand and train them to perform complex manipulation tasks.
Traditional robotics methods struggle. Motivates RL/ gradient-free trajectory optimisation.
Despite some successes, dexterous manipulation remains a significant challenge for RL (and other methods).
1. Introduced "Dexterous Gym" - a suite of challenging extensions to the Gym manipulation environments.
2. Develop a gradient-free trajectory optimisation algorithm that can solve many of these tasks significantly more reliably than existing methods.
3. Demonstrated that the most challenging task, "Pen Spin", could be "solved" by combining examples generated by the trajectory optimisation method with off-policy deep RL.
TOPDM only
TD3 + TOPDM demos
branch \(\alpha\)
For each initial move, \( \alpha \), define a weight as follows:
Can we do this without a full search of future
states?
previous visual sensor input
current visual sensor input
hidden layers of neurons
output: predicted probability of action
non-linear activation function: f(Aw+b)
Making the swarm turn
Guiding the swarm to
follow a trajectory