PlanGAN - Model-based planning for multi-goal,
Henry Charlesworth and Giovanni Montana
Warwick Manufacturing Group, University of Warwick
- Sparse rewards - usually easier to specify when a task is complete rather than defining a reward function
- Want to train agents capable of carrying out multiple different tasks
- Currently, best RL methods for these kinds of task are model-free (Hindsight Experience Replay)
- But model-based RL can often be substantially more sample efficient - can we come up with a model-based approach that works for sparse-reward, multi-goal problems?
- Build upon the same principle that underlies Hindsight Experience Replay - that trajectories that don't achieve the specified goal still contain useful information for how to achieve the goal(s) that actually were achieved
- Aim: a goal-conditioned generative model (GAN) that can produce plausible trajectories that lead from current state towards a specified goal state
- Train on gathered experience - relabel desired goal to be a goal achieved at some later point during the observed trajectory.
- Gives us lots of example data to train on!
- Gather data using a planner to plan actions using many trajectories produced by generative model
- Massively outperforms normal model-based method
- Significantly more sample efficient than model-free methods designed for these kinds of problems!
Copy of Future State Maximisation Workshop
By Henry Charlesworth