Martin Biehl
Meta Reinforcement Learning with Latent Variable Gaussian Processes (Saemundsson et al)
Introduce latent (meta-) variable \(h\) that identifies the environment (e.g. mass and length of pole in cart pole)
model environment dynamics \(p(x_{t+1}|a_t,x_t,h,\theta)\) with gaussian process
train the gaussian process on datasets from different environments
infer \(h\) on test environments and use it to predict future / select actions
Similar to FEP without intrinsic motivation
Based on MAML, but identical reward different dynamics
learn ensemble of DNN deterministic environment models on different subsets of data
use usual MAML to train policy that can adapt to each of them as fast as possible
Nice exploration side effects
WOULDA, COULDA, SHOULDA :COUNTERFACTUALLY -GUIDED POLICY SEARCH (Buesing et al., deepmind)
Model baseed RL using structural causal models with interventional calculus
Show that interventional calculus allows improved counterfactual policy evaluation
Text