Artyom Sorokin |22 Apr
Let's look again at montezuma's Revenge, everybody loves it!
It is pretty obvious what to do in this game...
Let's look again at montezuma's Revenge, everybody loves it!
It is pretty obvious what to do in this game...
for humans
but not for RL Agents
Paper: Loss is its own Reward
Main Idea:
Use experience from one set of tasks for faster learning and/or better performance on new tasks!
In RL, task = MDP!
Source Task
Target Task
Transfer Learning:
Train here
then train here
evaluate on this
Source Task
Target Task
Multi-Task Learning:
Train here
and here
evaluate on here
and here
Source/Target Domains
Meta Learning:
Train on these tasks
Source Tasks
Typically agent don't know which task it learns!
Try to learn as fast as possible:
Evaluate here
Sample
new tasks
Lifelong/Continual Learning:
Train here
retraining on old tasks is cheating!
Evaluate an all these tasks
then here
then here
then here
then here
then here
Main Idea:
Use experience from one set of tasks for faster learning and/or better performance on new tasks!
Transfer Learning:
Multi-Task Learning:
Meta-Learning:
Pretraining + Finetuning:
The most popular transfer learning method in (supervised) deep learning!
Invariance assumption: everything that is different between domains is irrelevant
train here
do well here
Task Loss
\(D_{\phi}(z)\)
CE-Loss
for Domain Classification
(same network)
Multiply grads from \(D_{\phi}(z)\) by \(-\lambda\)
i.e. train \(z\) to maximize CE-Loss
&
Invariance assumption: everything that is different between domains is irrelevant
Why is invariance not enough when the dynamics don’t match?
What if we can manipulate the source domain?
Looks like Meta-Learning to me...
Assumption:
The dynamics \(p(s_{t+1}|s_t, a_t)\) is the same in both domains but reward function is different
Common examples:
Model: very simple to transfer, since the model is already (in principle) independent of the reward
You can also transfer contextual policies, i.e. \(p(a|s, g_i)\)!
Sparse Reward setting:
Problem: RL learns nothing from failed attempts!
But humans can learn in the similar setting:
We can interpret all outcomes of agents' actions as goals:
Adding virtual goals creates a multi-task setting and enriches the learning signal!
Paper: Hindsight Experience Replay
Main Idea: substitute achieved results as desired goals
HER main components:
Paper: Hindsight Experience Replay
Paper: Hindsight Experience Replay
Can we learn faster by learning multiple tasks?
Multi-task learning can:
Sounds familiar... Domain Randomization?
Multi-task RL corresponds to single-task RL in a joint MDP:
This solution doesn't speed up learning as it doesn't transfer anything...
Idea: Learn with RL, transfer with SL
Papers: Actor-Mimic, Policy Distillation
Divide and Conquer Reinforcement Learning Algorithm sketch:
Paper: A Generalist Agent
Paper: A Generalist Agent
Secrets to success:
Paper: A Generalist Agent
Secrets to success:
Paper: A Generalist Agent
Paper: Progressive Neural Networks
Finetuning allows to transfer representations from task 1 to task 2
But what if you want to learn task3 next, and then task4...
This is actually a LifeLong Learning!
Paper: Progressive Neural Networks
Paper: Progressive Neural Networks
PathNet key details:
Train PathNet on a single Task:
Train PathNet on a single Task:
Train PathNet on a single Task:
Train PathNet on a single Task:
Train PathNet on a single Task:
Training on multiple Tasks: