A. Raffin, S. Höfer, R. Jonschkowski
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Policy
Agent
Environment
reward
action
observation
state
Icons designed by Freepik and distributed by Flaticon
Jonschkowski, R., & Brock, O. (2015). Learning state representations with robotic priors. Autonomous Robots, 39(3), 407-428.
observation
state
Temperature ?
Loss function
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Identify the current task
Learn task-specific state representation
Learn task-specific policies
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Gate Layer
State
Observation
Gate Layer
State
Observation
gate unit deactivated
1st Task
Gate Layer
State
Observation
gate unit deactivated
2nd Task
Ouput of the gate layer
Task-consistency
Task-separation
Same episode, same task
Different episode, task may change
during training, the task does not change within one episode
Measure of similarity
Cross-entropy
p,q: two probability distributions
Separation loss
Consistency loss
Gate Layer
State
Observation
Action
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Observation
Slot Car with Indicator: control the car corresponding to the color of the indicator
Top down view
Input image
One task
Training-Testing Cycle
Baselines
1. 4 training episodes of 100 steps each (2 per task)
2. Learn state representation + policy (KNN-Q)
3. Test on 10 episodes of 100 steps
Average reward per episode
10 test episodes (5 per task)
10 test episodes (5 per task)
Slot Car Blue: control the blue car
5 gate units
Task-Specific State Representation
Task Detector
Transfer Learning
Task-Specific Policies
Learning from experience
Output of the gate layer
Softmax:
Weight matrix gate (toy-task x-axis y-axis)
1st unit
2nd unit
Output of the gate layer
Softmax:
Task 0
Task 1
Ex:
Final Output (state representation)
Tensor example (toy-task x-axis y-axis)
Task 0
Ouput gate layer
Task 1