PFT-(DPW)
Overview
Online algorithms for POMDPs with continuous state, action, and observation spaces - Sunberg et al.
b_o
Input Belief
Root
Particle Belief
Insert new action if:
Action Progressive Widening
|C(b)| \le k_{a}N(b)^{\alpha_a}
Choose next action with
\text{argmax}_{a\in C(b)}\left\{ Q(ba) + c\sqrt{\frac{\log N(b)}{N(ba)}}\right\}
a
b
Generate next belief node
Insert new belief if:
|C(ba)| \le k_{o}N(ba)^{\alpha_o}
s_s \sim b \\
o \leftarrow G(s_s,a)
b
a
o
b'
b' = \tau(bao)
b
b',r \leftarrow G_{PF}(b,a)
a
Full Belief Propagation
o
s_i',r_i \leftarrow G(s_i,a)
w_i '= \eta w_i\mathcal{Z}(o|s_i,a,s_i')
r(b,a) = \sum_iw_ir_i
\eta = \left(\sum_i w_i\mathcal{Z}(o|s_i,a,s_i') \right)^{-1}
Propagate
Reweight
b'
Value Estimation
b'
\hat{V}(b') \approx \sum_i\hat{V}(s'_i)w'_i
- PO Rollout
- Sparse Belief VI
etc.
- FO Rollout
- State VI
etc.
b
a
o
b'
\text{total} = r(b,a) + \gamma\hat{V}(b')
N(b) \leftarrow N(b) + 1 \\
N(ba) \leftarrow N(ba) + 1 \\
Q(ba) \leftarrow Q(ba) + \frac{\text{total} - Q(ba)}{N(ba)}
a_1
a_2
o_1
o_2
o_1
o_2
When not widening observations
\text{i.e. } |C(ba)| > k_oN(ba)^{\alpha_o}
- Type Stability / Limiting dynamic dispatch
- JET.jl, code_warntype, flamegraph profiling
- Type parameterization
- Eliding observation generation in fully observable rollouts
- Caching belief vectors (less temporary arrays)
- Mutable to immutable
- Sizehinting vectors of unknown size
- Undef initialization of vectors of known size
Quick note on type mutability
(mutable types inside immutable types are still mutable)
mutable struct Mstruct{T}
i::T
v::Vector{T}
end
struct Istruct{T}
i::T
v::Vector{T}
end
julia> MS = Mstruct(1,[1,2,3])
Mstruct{Int64}(1, [1, 2, 3])
julia> MS.i += 1
2
julia> push!(MS.v,4)
4-element Vector{Int64}:
1
2
3
4
julia> MS.v[1] = 10
10
julia> MS.v
4-element Vector{Int64}:
10
2
3
4
julia> IS.i += 1
ERROR: setfield! immutable struct of type Istruct cannot be changed
Stacktrace:
[1] setproperty!(x::Istruct{Int64}, f::Symbol, v::Int64)
@ Base ./Base.jl:34
[2] top-level scope
@ none:1
julia> push!(IS.v,4)
4-element Vector{Int64}:
1
2
3
4
julia> IS.v[1] = 10
10
julia> IS
Istruct{Int64}(1, [10, 2, 3, 4])
Mutable
Immutable
PFT-DPW (No DPW)
By Tyler Becker
PFT-DPW (No DPW)
- 381