Using State Representation Learning with Robotic Priors
Detecting and Solving Multiple Tasks
A. Raffin, S. Höfer, R. Jonschkowski
Outline
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Reinforcement Learning (RL)
Policy
Agent
Environment
reward
action
observation
state
Icons designed by Freepik and distributed by Flaticon
State Representation Learning (SRL)
Jonschkowski, R., & Brock, O. (2015). Learning state representations with robotic priors. Autonomous Robots, 39(3), 407-428.
observation
state
Neural Networks (1/2)
Temperature ?
Loss function
Neural Networks (2/2)
Gating Connection
Outline
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Detecting and Solving Multiple Tasks (1/2)
Identify the current task
Learn task-specific state representation
Learn task-specific policies
Detecting and Solving Multiple Tasks (2/2)
Outline
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Learning Jointly a State Representation and a Task Detector (1/3)
Gate Layer
State
Observation
Learning Jointly a State Representation and a Task Detector (2/3)
Gate Layer
State
Observation
gate unit deactivated
1st Task
Learning Jointly a State Representation and a Task Detector (3/3)
Gate Layer
State
Observation
gate unit deactivated
2nd Task
Gated Network
Ouput of the gate layer
Task-Coherence Prior
Task-consistency
Task-separation
Same episode, same task
Different episode, task may change
during training, the task does not change within one episode
Losses for the Gate Layer (1/2)
Measure of similarity
Cross-entropy
p,q: two probability distributions
Losses for the Gate Layer (2/2)
Separation loss
Consistency loss
Multi-Task Agent
Gate Layer
State
Observation
Action
Outline
I. Background Knowledge
II. Idea and Motivations
III. Our Method
IV. Experiments and Results
Multi-Task Slot Car Racing
Observation
Slot Car with Indicator: control the car corresponding to the color of the indicator
Slot Car with Indicator
Top down view
Input image
State Representation Learned
One task
Experimental Design
Training-Testing Cycle
Baselines
- "SRL Vanilla" (Robotic Priors Only)
- Principal Component Analysis
- Raw Observations
- Variations of our method
1. 4 training episodes of 100 steps each (2 per task)
2. Learn state representation + policy (KNN-Q)
3. Test on 10 episodes of 100 steps
Average reward per episode
10 test episodes (5 per task)
Results: Slot Car with Indicator
10 test episodes (5 per task)
Results: Slot Car Blue
Slot Car Blue: control the blue car
Results: What was learned
5 gate units
Weights Gate Layer
Weights Gate Tensor
Weights SRL Vanilla
Influence of Losses
Conclusion
Task-Specific State Representation
Task Detector
Transfer Learning
Task-Specific Policies
Future Work
Learning from experience
Output of the gate layer
Softmax:
Maths details
Maths details
Weight matrix gate (toy-task x-axis y-axis)
1st unit
2nd unit
Maths details
Output of the gate layer
Softmax:
Task 0
Task 1
Ex:
Maths details
Final Output (state representation)
Maths details
Tensor example (toy-task x-axis y-axis)
Task 0
Ouput gate layer
Task 1
Detecting And Solving Multiple Tasks
By Antonin Raffin
Detecting And Solving Multiple Tasks
- 1,386