PlanGAN - Model-based planning for multi-goal,
sparse-reward environments
Henry Charlesworth and Giovanni Montana
Warwick Manufacturing Group, University of Warwick
NeurIPS 2020
Motivation
- Sparse rewards - usually easier to specify when a task is complete rather than defining a reward function
- Want to train agents capable of carrying out multiple different tasks
- Currently, best RL methods for these kinds of task are model-free (Hindsight Experience Replay)
- But model-based RL can often be substantially more sample efficient - can we come up with a model-based approach that works for sparse-reward, multi-goal problems?
Methodology
- Build upon the same principle that underlies Hindsight Experience Replay - that trajectories that don't achieve the specified goal still contain useful information for how to achieve the goal(s) that actually were achieved

Methodology
- Aim: a goal-conditioned generative model (GAN) that can produce plausible trajectories that lead from current state towards a specified goal state

- Train on gathered experience - relabel desired goal to be a goal achieved at some later point during the observed trajectory.
- Gives us lots of example data to train on!
Methodology
- Gather data using a planner to plan actions using many trajectories produced by generative model

Results

- Massively outperforms normal model-based method
- Significantly more sample efficient than model-free methods designed for these kinds of problems!
PlanGAN
By Henry Charlesworth
PlanGAN
- 719