Part I: "PlanGAN" - A model-based approach for multi-goal, sparse reward problems
Part II: Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep RL
Work carried out June 2019 - September 2020 as a Postdoc at the University of Warwick, working with Professor Giovanni Montana.
"PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals", Charlesworth & Montana, NeurIPS (2020)
Project website: https://sites.google.com/view/plangan
Model can be used for planning, counterfactual reasoning, re-used for other tasks etc.
More sample efficient - not just learning from scalar reward signal.
"Solving Complex Dexterous Manipulation Tasks With Trajectory Optimisation and Deep Reinforcement Learning", Charlesworth & Montana, ICML (2021)
Project website: https://dexterous-manipulation.github.io
Requires considerably more sophisticated manipulators than the typical "parallel jaw grippers" most robots in industry use today.
Natural to build robotic hands that try and mimic the human hand and train them to perform complex manipulation tasks.
Traditional robotics methods struggle. Motivates RL/ gradient-free trajectory optimisation.
Despite some successes, dexterous manipulation remains a significant challenge for RL (and other methods).
1. Introduced "Dexterous Gym" - a suite of challenging extensions to the Gym manipulation environments.
2. Develop a gradient-free trajectory optimisation algorithm that can solve many of these tasks significantly more reliably than existing methods.
3. Demonstrated that the most challenging task, "Pen Spin", could be "solved" by combining examples generated with the trajectory optimisation method with off-policy deep RL.
TOPDM only
TD3 + TOPDM demos
branch \(\alpha\)
For each initial move, \( \alpha \), define a weight as follows:
Can we do this without a full search of future
states?
previous visual sensor input
current visual sensor input
hidden layers of neurons
output: predicted probability of action
non-linear activation function: f(Aw+b)
Making the swarm turn
Guiding the swarm to
follow a trajectory