Standard Belief Tree
\(\mathcal{O}(|A|^DC^D) \)(|A|^D||Z|^D)
DESPOT
\(\mathcal{O}(K|A|^D)\)
DEterminized Sparse Partially Observable Tree
Sparse: Create subset of possible state trajectories from belief tree via scenarios
Determinized: Scenarios are randomly sampled state trajectories using a deterministic simulative model
Text
Determinization
Purpose: Don't want to attribute reward gains or losses due to process noise to the value inherent to a belief state.
Consequence: Possible Overfitting
Empirical policy belief value
Generalizing to any node in the constructed tree, define regularized, weighted, discounted utility (RWDU) as:
Regularize Reward
Belief Node Backup
Start at leaf belief nodes
Move through internal belief nodes
Evaluate best action at root belief node
(Anytime Regularized DESPOT)
Construct tree incrementally with belief value bounds heuristics
\(l(b), \mu(b)\) : lower and upper bounds of optimal RWDU at belief node \(b\) \(\rightarrow l(b) \le \nu^*(b) \le \mu(b)\)
\( L_0(b), U(b)\) : lower and upper bounds of non-regularized belief value \(\rightarrow L_0(b) \le \hat{V}^*(b) \le U(b)\)
\(\xi\) : Tuning parameter - Desired RWDU bounds contraction rate at root node \(b_0\)
Cease exploration once \(E(b') \le 0 \). Exploration beyond this point yields minimal RWDU bounds contraction
Pruning
Blocked if
Backup
DESPOT with \(\alpha\)-vector update
Belief in state determined by weight of representative particle
Upper bounds maintained with sawtooth approximation
Choose set of grid points to be an arbitrary grid point \(b' \in G\) and \(|S|-1\) extreme points of the belief simplex
Leaf Node
Internal Nodes
Text