Options From Example Trajectories

Zang 09 Paper by Denis, Emily

Outline

Why you should care
Subproblems+Options
Analysis
Experiments

Definitions

SMDP M = (S, A, P, R, gamma)
State Abstractions
- Up Projections g:S[F]→S[ ̃F]
- Down Projections h:S[ ̃F]→
Trajectory T
Subproblem (M, F, A, w)

Equations

Subproblems

What are subproblems?
How do we get them?
Why are they significant?

What makes a good Subproblem?

Size: encapsulates a significant chunk of the overall problem.
Frequency: subproblem arises frequently.
Abstraction: the greater the abstraction the faster we can solve the subproblem.

Recursing

Original problem: SMDP passed in. Base problem: SMDP called recursively. Msub: R, transitions?
IF Msub Solved -> Msub Becomes Option. A U S{o}
Solving V(s), T(s). Prob = 1, Discount = T(s)
Suffix Tree generation for common actions

Example:

Given ENNNPWWWWD and that NN is common action
ENNNPWWWWD - expand until state abstraction broken
Goal is state prior to pickup-action.
Extend backwards in time out of state abstraction:
ENNN. Assign this to var X. New string: XPWWWWD

Analysis

Requirements:
- model
- near-optimal trajectories
Best for problems where different subproblems require different features
- 3D Flying vs pole balancing
O(T^2) cost for best subproblem. And T < N!
Works in deterministic & better in non-deterministic

Trajectory length is the culprit

Text

Robustness

Handles noise well.

But, remember that you need to see all actions.

Experiments

Bad Problems
Taxi
Wars

Pole Balancing

Taxi

Taxi() 4-8 Ts
- PickupPassenger()
  - Navigate()

Wargus

~15 Ts

Idle
GoToGoldMine
GotoWoods,
Chop
Mine
BuildBlackSmith
TrainGrunt

Army size: 4

OpLearn: 2 mil

VI: 20 mil

Army Size: 8

OpLearn: 3 mil

VI: 80 mil

Take-Aways

Recursively find sub-problems
- To reduce state space + computations
  - To reduce trajectories needed

Options

By dpeskov

Options

666

dpeskov