DESPOT-α: Online POMDP Planning With Large State And Observation Spaces
Neha P Garg, David Hsu and Wee Sun Lee
Presented by Tyler Becker
Small \(|\mathcal{O}|\)
Large \(|\mathcal{O}|\)
DESPOT + Weighted PF + \(\alpha\) vectors
Determinized Scenarios:
Overarching "anytime" algorithmic goal:
Propagate
Reweight
DESPOT utilizes determinized scenarios
Assuming \(\alpha\)'s are similar among sibling nodes, we can estimate value of sibling nodes without having to recalculate \(\alpha\)
\(\xi\) : hyperparameter controlling desired leaf node uncertainty
Expand all \(a \in \mathcal{A}\)
propagate particles / collect observations
rollout for lower bound \(\alpha(s)\)
Create \(C \le K\) new beliefs from size \(C\) subset of sampled observations
For each action
Reweight particles
Estimate Belief Value: \(V(\tau(b,a,z)) = w_{\tau(b,a,z)}^T\alpha\)
\(\text{depth}(b) > D\)
- maximum depth reached
OR
\(\text{WEU}(b) < 0\)
- exploration of belief node no longer heuristically promising
Finally, traverse each node back up to the root performing Bellman backups along the way
\(C_{b,a}\) may not contain all possible observations
Claim:
\(\alpha\) now dependent on \(\eta\), which itself is dependent on \(w_b(s) \implies\) can't share \(\alpha\)'s between sibling nodes
Proposed Solution
DESPOT + Weighted PF + \(\alpha\) vectors