and how to solve them (approximately) using Julia
Types of Uncertainty
OUTCOME
MODEL
STATE
Markov Decision Process (MDP)
Policy: \(\pi : \mathcal{S} \to \mathcal{A}\)
Maps every state to an action.
Partially Observable Markov Decision Process (POMDP)
Belief
History: all previous actions and observations
\[h_t = (b_0, a_0, o_1, a_1, o_2, ..., a_{t-1}, o_t)\]
Belief: probability distribution over \(\mathcal{S}\) encoding everything learned about the state from the history
\[b_t(s) = P(s_t=s \mid h_t)\]
Belief
History: all previous actions and observations
\[h_t = (b_0, a_0, o_1, a_1, o_2, ..., a_{t-1}, o_t)\]
Belief: probability distribution over \(\mathcal{S}\) encoding everything learned about the state from the history
\[b_t(s) = P(s_t=s \mid h_t)\]
A POMDP is an MDP on the belief space
Policy: \(\pi : \mathcal{B} \to \mathcal{A}\)
Maps every belief/history to an action.
C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of Markov decision processes,” Mathematics of Operations Research, vol. 12, no. 3, pp. 441–450, 1987
POMDPs are PSPACE-Complete
\[Q_{MDP}(b, a) = \sum_{s \in \mathcal{S}} Q_{MDP}(s,a) b(s) \geq Q^*(b,a)\]
\(\pi(b) = \arg\max Q_{MDP}(b, a)\) is equivalent to assuming full observability on the next step
Will not take costly exploratory actions
$$Q_{MDP}(s, a) = E \left[ \sum_{t=0}^{\infty} \gamma^t R\left(s_t, \pi^*(s_t)\right) \bigm| s_0 = s, a_0 = a\right]$$
Let \(Q_{MDP}\) be the state-action value function for the fully observable MDP
Value Function
Tree at current belief
\(b\)
Action Nodes
Observation Nodes
\(b\)
Action Nodes
Observation Nodes
Creating the entire tree is exponentially complex
Monte Carlo Tree Search
Image by Dicksonlaw583 (CC 4.0)
POMCP
Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." Advances in neural information processing systems. 2010.
Ross, Stéphane, et al. "Online planning algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (2008): 663-704.
This Toolbox is designed to minimize the sum of coding, execution and analysis time for those who want to explore level set methods. Computationally, Matlab is not the fastest environment in which to solve PDEs, but as a researcher I have found that the enviroment has several important advantages over faster compiled implementations
From the level set toolbox website:
Compiled Languages
"Dynamic" Languages
The Celeste team achieved peak performance of 1.54 petaflops using 1.3 million threads on 9,300 Knights Landing (KNL) nodes of the Cori supercomputer at NERSC. This result was achieved through a combination of a sophisticated parallel scheduling algorithm and optimizations to the single core version which resulted in a 1,000x improvement on a single core compared to the previously published version.
process(x::YourAbstractType, y::MyAbstractType)
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
Explicit
Generative
\(s,a\)
\(s', o, r\)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."