Zachary Sunberg
May 25, 2018
Introduction
UAV Collision Avoidance
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Controlling an Autonomous Vehicle is inherently a multi-objective problem
Minimize resource use
(especially time)
Minimize the risk of harm to oneself and others
Increasing safety can only decrease efficiency
Two extremes:
XXX Credit
“self-driving vehicles were not at fault in any crashes they were involved in.”
XXX Credit
PLANNING UNDER UNCERTAINTY
HARD SAFETY CONSTRAINTS
A systematic approach:
Objective: maximize future reward
$$\mathop{\text{maximize}} E\left[\sum_{t=0}^{\infty} \gamma^t r_t\right]$$
$$r_t = R(s_t, a_t, s_{t+1})$$
How do we optimize both safety and efficiency?
Objective Function
$$R(s_t, a_t, s_{t+1}) = R_\text{E}(s_t, a_t, s_{t+1}) + \lambda R_\text{S}(s_t, a_t, s_{t+1})$$
Safety
Weight
Efficiency
Safety
Better Performance
Model \(M_2\), Algorithm \(A_2\)
Model \(M_1\), Algorithm \(A_1\)
Efficiency
Dynamic Model: Types of Uncertainty
Outcome
Model
State
State Transitions
Observations
State Transition Distribution
Observation Distribution
Reward
Markov Model
Markov Decision Process (MDP)
Partially Observable Markov Decision Process (POMDP)
Belief
\[b'(s') = \frac{\int_{s\in\mathcal{S}} Z(o \mid s, a, s') T(s' \mid s, a) b(s) ds}{\int_{s'\in\mathcal{S}} \int_{s\in\mathcal{S}} Z(o \mid s, a, s') T(s' \mid s, a) b(s) ds ds'}\]
History: all previous actions and observations
\[h_t = (b_0, a_0, o_1, a_1, o_2, ..., a_{t-1}, o_t)\]
Belief: probability distribution over \(\mathcal{S}\) encoding everything learned about the state from the history
\[b_t(s) = P(s_t=s \mid h_t)\]
A POMDP is an MDP on the belief space.
Solving MDPs and POMDPs - The Value Function
$$\mathop{\text{maximize}} V_\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t r_t \bigm| s_0 = s, a_t = \pi(s_t) \right]$$
$$V^*(s) = \max \left\{R(s, \pi(s)) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=\pi(s)\Big]\right\}$$
Involves all future time
Involves only \(t\) and \(t+1\)
$$Q_\pi(s, a) = E\left[\sum_{t=0}^{\infty} \gamma^t r_t \bigm| s_0 = s, a_0 = a, a_t = \pi(s_t)\right]$$
$$V^*(s) = \max_{a \in \mathcal{A}} Q^*(s,a)$$
Solving MDPs and POMDPs - Offline vs Online
Offline
(Value Iteration)
Online
(Sequential Decision Tree)
QMDP
\[Q_{MDP}(b, a) = \sum_{s \in \mathcal{S}} Q_{MDP}(s,a) b(s) \geq Q^*(b,a)\]
Equivalent to assuming full observability on the next step.
Will not take costly exploratory actions.
Introduction
UAV Collision Avoidance
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Trusted and Optimized UAV Collision Avoidance
Challenge 1: Horizontal maneuvers
Traditional Solution: Altitude Separation
UAVs Require Horizontal Maneuvers
Challenge 2: Certification
Bach, R., Farrell, C., & Erzberger, H. (2009). An Algorithm for Level-Aircraft Conflict Resolution.
Additional buffer compensates for uncertainty
1.0
1.5
2.0
3.0
4.0
MDP Formulation
\(s=\left(x^{(o)}, y^{(o)}, \psi^{(o)}, x^{(i)}, y^{(i)}, \psi^{(i)}, \text{dev}\right)\)
\(a = \phi^{(o)}\), \(\mathcal{A} = \{-45^\circ, -22.5^\circ, 0^\circ, 22.5^\circ, 45^\circ\}\)
\(R(s, a) = -c_\text{step} + r_\text{goal} \, \text{in\_goal}(s) - c_\text{dev} \, \text{dev}(s,a) - \lambda \,\text{nmac}(s) \)
Transition: \(\quad \dot{\psi}^{(o)}_t = \frac{g \, \tan \phi^{(o)}_t}{v^{(o)}}\), \(\quad \dot{\psi}^{(i)}_t \sim \mathcal{N}(0, 10^\circ/s)\)
Ownship State
Intruder State
Bank Angle
Approximate Value Iteration
\[\tilde{V}(s) = \beta(s)^\top \theta\]
\[\tilde{V}_{k+1}(s) = \Pi \mathcal{B}[\tilde{V}_k](s)\]
At each iteration, sample \(N_\text{state}\) states and estimate the value with
\[v_{k+1}[n] \gets \max_{a \in \mathcal{A}} \left\{R(s^{[n]},a) + \frac{1}{N_{\text{EV}}}\sum_{m=1}^{N_{\text{EV}}} \beta(F(s^{[n]}, a, w_m))^\top \theta_k \right\} \text{,}\]
then, project onto the linear subspace with
\[\theta_{k+1} = \text{argmin} \sum_{n=1}^{N_\text{state}} \left( \beta\left(s^{[n]}\right)^\top \theta - v_{k+1}[n] \right) ^2 \text{.}\]
Features
Value Function and Policy
\(45^\circ\)
\(-45^\circ\)
\(\phi=0^\circ\)
Price of certifiability
Trusted Resolution Logic
Directly Optimized Turn Rate
Maintaining certifiability
\[\tilde{\mathcal{A}}(s) = \left\{a \in \mathcal{A} \mid \text{mindist}(s, a) \geq D_\text{NMAC} \right\}\]
Price of certifiability
Trusted Resolution Logic
Directly Optimized Turn Rate
Trusted Optimized Turn Rate
Introduction
UAV Collision Avoidance
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
Human Behavior Model: IDM and MOBIL
POMDP Formulation
\(s=\left(x, y, \dot{x}, \left\{(x_c,y_c,\dot{x}_c,l_c,\theta_c)\right\}_{c=1}^{n}\right)\)
\(o=\left\{(x_c,y_c,\dot{x}_c,l_c)\right\}_{c=1}^{n}\)
\(a = (\ddot{x}, \dot{y})\), \(\ddot{x} \in \{0, \pm 1m/s^2\}, \dot{y} \in \{0, \pm 0.67 m/s^2\}\)
Ego physical state
Physical states of other cars
Internal states of other cars
Physical states of other cars
Assume normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Simulation results
Assume normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Assume normal
Omniscient
Mean MPC
QMDP
POMCPOW
Introduction
UAV Collision Avoidance
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Monte Carlo Tree Search
POMCP
Light-Dark Problem
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
Necessary Conditions for Consistency [1]
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
Solving continuous POMDPs - POMCP fails
[1] Adrien Coutoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, Nicolas Bonnard. Continuous Upper Confidence Trees. LION’11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, Jan 2011, Italy. pp.TBA. <hal-00542673v2>
POMCP
Limit number of children to
\[k N^\alpha\]
POMCP
POMCP-DPW
POMCP-DPW converges to QMDP
Proof Outline:
Observation space is continuous → observations unique w.p. 1.
(1) → One state particle in each belief, so each belief is merely an alias for a that state
(2) → POMCP-DPW = MCTS-DPW applied to fully observable MDP + root belief state
Solving this MDP is equivalent to finding the QMDP solution → POMCP-DPW converges to QMDP
POMCP-DPW
Necessary Conditions for Consistency
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
[ ] An infinite number of particles must be added to each belief node
Use \(Z\) to insert weighted particles
POMCP
POMCP-DPW
POMCPOW
Introduction
UAV Collision Avoidance
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia