Continuous
Prioritized Sweeping
with
Gaussian Processes

Sample-efficiency!
Learning with fewer interactions.

GOAL

Data is in tuple form: <s,a,s',r>

GOAL

Data is in tuple form: <s,a,s',r>

GOAL

MODEL FREE

Prioritize experience replay based on past
TD errors

Experience replay
(randomly feed collected data to learning algorithm)

Data is in tuple form: <s,a,s',r>

GOAL

MODEL FREE

Prioritize experience replay based on past
TD errors

Experience replay
(randomly feed collected data to learning algorithm)

MODEL BASED

Use model to generate new data/backup the value function

Prioritize based on past
TD errors AND learned dynamics

PRIORITIZED SWEEPING

Q(s,a) \mathrel{+}= \alpha\left [ R(s,a) + \gamma\max_{a'}Q(s',a') - Q(s,a) \right ]

PRIORITIZED SWEEPING

Q(s,a) \mathrel{+}= \alpha\left [ R(s,a) + \gamma\max_{a'}Q(s',a') - Q(s,a) \right ]

PRIORITIZED SWEEPING

Q(s,a) \mathrel{+}= \alpha\left [ R(s,a) + \gamma\max_{a'}Q(s',a') - Q(s,a) \right ]

PRIORITIZED SWEEPING

Q(s,a) \mathrel{+}= \alpha\left [ R(s,a) + \gamma\max_{a'}Q(s',a') - Q(s,a) \right ]

PRIORITIZED SWEEPING

p_{s,a} = \left | \Delta_{s'} \right |\ \ \cdot \ \ T(s' | s, a)

Update V(s'), which gives
Iterate all s,a pairs
Compute all priorities
Pick highest and repeat

How to prioritize?

\Delta_{s'}

Model

TD error

CONTINUOUS SETTING

Learn continuous T
Iterate s,a pairs to update priority

PROBLEMS

CONTINUOUS SETTING

Learn continuous T
Iterate s,a pairs to update priority

PROBLEMS

Gaussian Process!

SOLUTION

GAUSSIAN PROCESS

Infinite dimensional multivariate Gaussian

Sample-efficient & Bayesian function approximator

GAUSSIAN PROCESS

It's a DISTRIBUTION over functions!

It can be learned easily and sample-efficiently (there are some gotchas)

It keeps track of uncertainty!

CONTINUOUS PS

p_{s,a} = \left | \Delta_{s'} \right |\ \ \cdot \ \ T(s' | s, a)

Model

TD error

This is what we need:

CONTINUOUS PS

T(s' | s, a)

with a GP

CONTINUOUS PS

Plot shows deterministic mean, but GP handles stochastic transitions!

CONTINUOUS PS

Iterating over parents equals slicing GP at a given height

matplotlib screws the projection of
overlapping surfaces, sorry!

CONTINUOUS PS

Taking into account uncertainty, it'd look like this

CONTINUOUS PS

Now we only need to go over all parents, and we can compute the priorities

But wait.. this is still a continuous function!

Approximate using sampling!

CONTINUOUS PS

Sample arbitrary s,a

Some ways to optimize this:
- Latin Hypercube sampling
- Gradient descent (depending on GP kernel)

Compute PDF using GP, and add the resulting priority to a queue

CONTINUOUS PS

EXPERIMENTAL SETTING

Test DQN as:
- vanilla
- experience replay
- prioritized experience replay
- use GP to prioritize experience replay
- use GP to fully sample new experience

Multiagent Extension

Already works for discrete settings (under domain knowledge assumptions)

Main idea is that priorities are factored as sum of smaller functions

Each function only depends on some state features and agents

Multiagent Extension

Important being able to represent priorities as functions

Since we are already there, use mixtures of Gaussians!

Multiagent Extension

Cluster samples using variational Expectation Maximization (weighted)

Multiagent Extension

Now we can represent priorities as actual functions, and can do fancy things with them!

Enhanced buzzword level

As long as everything works...

Questions?

Continuous PS with GPs

By svalorzen

Continuous Prioritized Sweeping with Gaussian Processes

GOAL

GOAL

GOAL

GOAL

PRIORITIZED SWEEPING

PRIORITIZED SWEEPING

PRIORITIZED SWEEPING

PRIORITIZED SWEEPING

PRIORITIZED SWEEPING

How to prioritize?

CONTINUOUS SETTING

CONTINUOUS SETTING

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

GAUSSIAN PROCESS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

CONTINUOUS PS

EXPERIMENTAL SETTING

Multiagent Extension

Multiagent Extension

Multiagent Extension

Multiagent Extension

Questions?

Continuous PS with GPs

More from svalorzen

Continuous
Prioritized Sweeping
with
Gaussian Processes