Using State Representation Learning with Robotic Priors

Detecting and Solving Multiple Tasks

A. Raffin, S. Höfer, R. Jonschkowski

Outline

I. Background Knowledge

II. Idea and Motivations

III. Our Method

IV. Experiments and Results

Reinforcement Learning (RL)

\pi(s) = a
r_{t}

Policy

a_t

Agent

Environment

reward

action

observation

o_{t}
s_{t}

state

Icons designed by Freepik and distributed by Flaticon

State Representation Learning (SRL)

\left( x_{agent}, y_{agent}\right)

Jonschkowski, R., & Brock, O. (2015). Learning state representations with robotic priors. Autonomous Robots, 39(3), 407-428.

observation

state

Neural Networks (1/2)

\left\{ \begin{array}{ll} \text{Palaiseau}\\ \text{4th October 2016}\\ \text{10 a.m.} \end{array} \right.
f : X \rightarrow Y

Temperature ?

X
Y
\rightarrow
f = argmin_f \: \mathcal{L}(f, X, Y)

Loss function

Neural Networks (2/2)

z = \sigma(\sum_{i=1}^n w_{i} \times x_i)

Gating Connection

z = \begin{cases} 0 & \text{if g=0}\\ w_{xz}x & \text{if g=1} \end{cases}
z = w_{(xg)z} \cdot (x \times g)

Outline

I. Background Knowledge

II. Idea and Motivations

III. Our Method

IV. Experiments and Results

Detecting and Solving Multiple Tasks (1/2)

Identify the current task

Learn task-specific state representation

Learn task-specific policies

Detecting and Solving Multiple Tasks (2/2)

Outline

I. Background Knowledge

II. Idea and Motivations

III. Our Method

IV. Experiments and Results

Learning Jointly a State Representation and a Task Detector (1/3)

Gate Layer

State

Observation

Learning Jointly a State Representation and a Task Detector (2/3)

Gate Layer

State

Observation

gate unit deactivated

1st Task

Learning Jointly a State Representation and a Task Detector (3/3)

Gate Layer

State

Observation

gate unit deactivated

2nd Task

Gated Network

Ouput of the gate layer

\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^n e^{z_k}}

Task-Coherence Prior

Task-consistency

Task-separation

Same episode, same task

Different episode, task may change

during training, the task does not change within one episode

Losses for the Gate Layer (1/2)

Measure of similarity

Cross-entropy

\mathrm{H}(p, q) = -\sum_i p(i)\, \log q(i). \!

p,q: two probability distributions

\text{if p=q}, \quad \mathrm{H} = 0

Losses for the Gate Layer (2/2)

Separation loss

\mathrm{L_{separation}} = \exp(-\text{cross-entropy}_\text{between two episodes})
\mathrm{L_{consistency}} = \text{cross-entropy}_\text{with the same episode}

Consistency loss

Multi-Task Agent

Gate Layer

State

Observation

Action

Outline

I. Background Knowledge

II. Idea and Motivations

III. Our Method

IV. Experiments and Results

Multi-Task Slot Car Racing

Observation

Slot Car with Indicator: control the car corresponding to the color of the indicator

Slot Car with Indicator

Top down view

Input image

State Representation Learned

One task

Experimental Design

Training-Testing Cycle

Baselines

  • "SRL Vanilla" (Robotic Priors Only)
  • Principal Component Analysis
  • Raw Observations
  • Variations of our method

1. 4 training episodes of 100 steps each (2 per task)

2. Learn state representation + policy (KNN-Q)

3. Test on 10 episodes of 100 steps

Average reward per episode

10 test episodes (5 per task)

Results: Slot Car with Indicator

10 test episodes (5 per task)

Results: Slot Car Blue

Slot Car Blue: control the blue car

Results: What was learned

5 gate units

Weights Gate Layer

Weights Gate Tensor

Weights SRL Vanilla

Influence of Losses

Conclusion

Task-Specific State Representation 

Task Detector

Transfer Learning

Task-Specific Policies

Future Work

Learning from experience

Output of the gate layer

h_g = softmax(\sum\limits_{i=1}^{n_{inputs}} v_{ig} \times x_i)
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^n e^{z_k}}

Softmax:

Maths details

v_{ig} : \text{weights from inputs to gate layer}
X : inputs

Maths details

Weight matrix gate (toy-task x-axis y-axis)

\begin{pmatrix} 0 & 0\\ 0 & 0\\ 0 & 0\\ 0 & 0\\ -10 & 10 \end{pmatrix}
\begin{pmatrix} x_{agent}\\ y_{agent}\\ x_{object}\\ y_{object}\\ bit_{task} \end{pmatrix}

1st unit

2nd unit

Maths details

Output of the gate layer

h = \begin{pmatrix} out_{unit1} & out_{unit2} \end{pmatrix}
out_{unit1} + out_{unit2} = 1

Softmax:

\begin{pmatrix} 0 & 1 \end{pmatrix}
\begin{pmatrix} 1 & 0 \end{pmatrix}

Task 0

Task 1

Ex:

Maths details

Final Output (state representation)

state = \sum\limits_{k=1}^{n_{outputs}} \sum\limits_{g=1}^{n_{gates}} \sum\limits_{i=1}^{n_{inputs}} h_g \times w_{igk} \times x_i
h_g : \text{output of the gate unit g}
W : \text{tensor with the weights from inputs to output}
X : inputs

Maths details

Tensor example (toy-task x-axis y-axis)

\begin{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix}\\\\ \begin{pmatrix} 0 \\ 1 \end{pmatrix}\\\\ \begin{pmatrix} 0 \\ 0 \end{pmatrix}\\\\ \begin{pmatrix} 0 \\ 0 \end{pmatrix}\\\\ \begin{pmatrix} 0 \\ 0 \end{pmatrix} \end{pmatrix}
\begin{pmatrix} x_{agent}\\\\ y_{agent}\\\\ x_{object}\\\\ y_{object}\\\\ bit_{task} \end{pmatrix}
\begin{pmatrix} 1 & 0 \end{pmatrix}

Task 0

Ouput gate layer

\begin{pmatrix} 0 & 1 \end{pmatrix}

Task 1

Made with Slides.com