Safety and Efficiency in Autonomous Vehicles through Online Learning
Zachary Sunberg
Postdoctoral Scholar
University of California
Autonomy is becoming possible and prevalent
How do we deploy with Confidence?
Waymo Image By Dllu - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=64517567
Shoettle and Sivak, "A Preliminary Analysis of Real-World Crashes Involving Self-Driving Vehicles" UMTRI-2015-34
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
Two Objectives for Autonomy
EFFICIENCY
SAFETY
Minimize resource use
(especially time)
Minimize the risk of harm to oneself and others
Hard Safety: Guaranteeing that there will be no harm
Soft Safety: Reducing the risk of harm
Two extremes:
"The two greatest risks in life are risking too much and risking too little" - Utah Avalanche Center Podcast
Objective Function
$$R(s_t, a_t) = R_\text{E}(s_t, a_t) + \lambda R_\text{S}(s_t, a_t)$$
Safety
Weight
Efficiency
\[\text{maximize} \quad E \left[ \sum_{t=0}^\infty \gamma^t R(s_t, a_t) \right]\]
Objective Function
Safety
Better Performance
Model \(M_2\), Algorithm \(A_2\)
Model \(M_1\), Algorithm \(A_1\)
Efficiency
$$R(s_t, a_t) = R_\text{E}(s_t, a_t) + \lambda R_\text{S}(s_t, a_t)$$
Safety
Weight
Efficiency
Uncertainty Expression
OUTCOME
MODEL
STATE
Markov Model
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times\mathcal{S} \to \mathbb{R}\) - Transition probability distributions
Markov Decision Process (MDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Reward
Partially Observable Markov Decision Process (POMDP)
- \(\mathcal{S}\) - State space
- \(T:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Transition probability distribution
- \(\mathcal{A}\) - Action space
- \(R:\mathcal{S}\times \mathcal{A} \times\mathcal{S} \to \mathbb{R}\) - Reward
- \(\mathcal{O}\) - Observation space
- \(Z:\mathcal{S} \times \mathcal{A}\times \mathcal{S} \times \mathcal{O} \to \mathbb{R}\) - Observation probability distribution
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
Sadigh, Dorsa, et al. "Information gathering actions over human internal state." Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016.
Schmerling, Edward, et al. "Multimodal Probabilistic Model-Based Planning for Human-Robot Interaction." arXiv preprint arXiv:1710.09483 (2017).
Sadigh, Dorsa, et al. "Planning for Autonomous Cars that Leverage Effects on Human Actions." Robotics: Science and Systems. 2016.
Tweet by Nitin Gupta
29 April 2018
https://twitter.com/nitguptaa/status/990683818825736192
POMDP Formulation
\(s=\left(x, y, \dot{x}, \left\{(x_c,y_c,\dot{x}_c,l_c,\theta_c)\right\}_{c=1}^{n}\right)\)
\(o=\left\{(x_c,y_c,\dot{x}_c,l_c)\right\}_{c=1}^{n}\)
\(a = (\ddot{x}, \dot{y})\), \(\ddot{x} \in \{0, \pm 1 \text{ m/s}^2\}\), \(\dot{y} \in \{0, \pm 0.67 \text{ m/s}\}\)
Ego physical state
Physical states of other cars
Internal states of other cars
Physical states of other cars
- Actions filtered so they can never cause crashes
- Braking action always available
Efficiency
Safety
Human Behavior Model: IDM and MOBIL
M. Treiber, et al., “Congested traffic states in empirical observations and microscopic simulations,” Physical Review E, vol. 62, no. 2 (2000).
A. Kesting, et al., “General lane-changing model MOBIL for car-following models,” Transportation Research Record, vol. 1999 (2007).
A. Kesting, et al., "Agents for Traffic Simulation." Multi-Agent Systems: Simulation and Applications. CRC Press (2009).
All drivers normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Simulation results
Assume normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Correlation Trend
Robustness
Next Steps: Testing against Learned Models
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
Information State
History: all previous actions and observations
\[h_t = (b_0, a_0, o_1, a_1, o_2, ..., a_{t-1}, o_t)\]
Belief: probability distribution over \(\mathcal{S}\) encoding everything learned about the state from the history
\[b_t(s) = P(s_t=s \mid h_t)\]
A POMDP is an MDP on the belief space
Belief Example: Laser Tag
Solving MDPs and POMDPs - The Value Function
$$\mathop{\text{maximize}} V_\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s_t, \pi(s_t)) \bigm| s_0 = s \right]$$
$$V^*(s) = \max \left\{R(s, a) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=a\Big]\right\}$$
Involves all future time
Involves only \(t\) and \(t+1\)
\(a \in \mathcal{A}\)
C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of Markov decision processes,” Mathematics of Operations Research, vol. 12, no. 3, pp. 441–450, 1987
But POMDPs are still PSPACE-Complete
Online Decision Process Tree Approaches
State Node
Action Node
(Estimate value function here)
Monte Carlo Tree Search
Image by Dicksonlaw583 (CC 4.0)
POMCP
- Uses simulations of histories instead of full belief updates
- Each belief is implicitly represented by a collection of unweighted particles
Silver, David, and Joel Veness. "Monte-Carlo planning in large POMDPs." Advances in neural information processing systems. 2010.
Ross, Stéphane, et al. "Online planning algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (2008): 663-704.
Light-Dark Problem
State
Timestep
Accurate Observations
Goal: \(a=0\) at \(s=0\)
Optimal Policy
Localize
\(a=0\)
QMDP
\[Q_{MDP}(b, a) = \sum_{s \in \mathcal{S}} Q_{MDP}(s,a) b(s) \geq Q^*(b,a)\]
Equivalent to assuming full observability on the next step
Will not take costly exploratory actions
$$Q_\pi(s,a) = R(s, a) + \gamma^t E\left[V_\pi (s')\right]$$
Belief Trajectories
QMDP Predicted Trajectories
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
Solving continuous POMDPs - POMCP fails
[1] Adrien Coutoux, Jean-Baptiste Hoock, Nataliya Sokolovska, Olivier Teytaud, Nicolas Bonnard. Continuous Upper Confidence Trees. LION’11: Proceedings of the 5th International Conference on Learning and Intelligent OptimizatioN, Jan 2011, Italy. pp.TBA. <hal-00542673v2>
POMCP
✔
✔
Limit number of children to
\[k N^\alpha\]
Necessary Conditions for Consistency [1]
POMCP
POMCP-DPW
POMCP-DPW converges to QMDP
Proof Outline:
-
Observation space is continuous → observations unique w.p. 1.
-
(1) → One state particle in each belief, so each belief is merely an alias for that state
-
(2) → POMCP-DPW = MCTS-DPW applied to fully observable MDP + root belief state
-
Solving this MDP is equivalent to finding the QMDP solution → POMCP-DPW converges to QMDP
Sunberg, Z. N. and Kochenderfer, M. J. "Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces", ICAPS (2018)
POMCP-DPW
[ ] An infinite number of child nodes must be visited
[ ] Each node must be visited an infinite number of times
[ ] An infinite number of particles must be added to each belief node
✔
✔
Necessary Conditions for Consistency
Use \(Z\) to insert weighted particles
✔
POMCP
POMCP-DPW
POMCPOW
Ye, Nan, et al. "DESPOT: Online POMDP planning with regularization." Journal of Artificial Intelligence Research 58 (2017): 231-266.
Next Step: Planning on Weighted Scenarios
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
POMDPs.jl - An interface for defining and solving MDPs and POMDPs in Julia
Challenges for POMDP Software
- POMDPs are computationally difficult.
- There is a huge variety of
- Problems
- Continuous/Discrete
- Fully/Partially Observable
- Generative/Explicit
- Simple/Complex
- Solvers
- Online/Offline
- Alpha Vector/Graph/Tree
- Exact/Approximate
- Domain-specific heuristics
- Problems
Julia - Speed
Challenges for POMDP Software
- POMDPs are computationally difficult.
- There is a huge variety of
- Problems
- Continuous/Discrete
- Fully/Partially Observable
- Generative/Explicit
- Simple/Complex
- Solvers
- Online/Offline
- Alpha Vector/Graph/Tree
- Exact/Approximate
- Domain-specific heuristics
- Problems
Explicit
Generative
\(s,a\)
\(s', o, r\)
Previous C++ framework: APPL
"At the moment, the three packages are independent. Maybe one day they will be merged in a single coherent framework."
Introduction
Lane Changing with Internal States
Solving Continuous POMDPs
POMDPs.jl
Future
Future Research
Deploying autonomous agents with Confidence
Practical Safety Gaurantees
Trusting Visual Sensors
Algorithms for Physical Problems
Testing on Physical Vehicles
TODO: MAKE ICON
TODO: MAKE ICON
Trusting Information from Visual Sensors
Environment
Belief State
Convolutional Neural Network
Control System
Architecture for Safety Assurance
Algorithms for the Physical World
Weaknesses of the algorithms I have researched:
1. Data-driven models on modern parallel hardware
2. Multidimensional continuous action spaces
CPU Image By Eric Gaba, Wikimedia Commons user Sting, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=68125990
Practical Safety Guarantees
- Reachability-based guarantees are usually infeasible
- Probabilistic guarantees involve low-probability distribution tails
Testing with Physical Vehicles
Texas A&M (HUSL):
TREX 600 RC Helicopter
Stanford (VAIL)
Colorado (RECUV)
Autorotation
"Expert System" Autorotation Controller
Example Controller: Flare
\[\alpha \equiv \frac{KE_{\text{available}} - KE_{\text{flare exit}}}{KE_{\text{flare entry}} - KE_{\text{flare exit}}}\]
\[TTLE = TTLE_{max} \times \alpha\]
\[\ddot{h}_{des} = -\frac{2}{TTI_F^2}h - \frac{2}{TTI_F}\dot{h}\]
Bad because:
- No optimality justification
- Controller must be adjusted by hand to accommodate any changes
- Difficult to communicate about
- Parameters not intuitive
Autorotation Simulation Results
References
Acknowledgements
The content of my research reflects my opinions and conclusions, and is not necessarily endorsed by my funding organizations.
TODO: HSL PICTURE
Thank You!
Convex Optimization
Boyd, Stephen, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
Convexity <=> Exact Solution Tractable
all \(f\) convex
Energy Kites
- Need safety guarantees
- Every bit of efficiency is extra energy
- Complex and partially observable dynamics
- Aircraft dynamics
- Tether state
- Wind
Solving MDPs and POMDPs - Offline vs Online
ONLINE
OFFLINE
Value Iteration
Sequential Decision Trees
Solving MDPs and POMDPs - The Value Function
$$\mathop{\text{maximize}} V_\pi(s) = E\left[\sum_{t=0}^{\infty} \gamma^t R(s_t, a_t) \bigm| s_0 = s, a_t = \pi(s_t) \right]$$
$$V^*(s) = \max \left\{R(s, a) + \gamma E\Big[V^*\left(s_{t+1}\right) \mid s_t=s, a_t=a\Big]\right\}$$
Involves all future time
Involves only \(t\) and \(t+1\)
\(a \in \mathcal{A}\)
Handle continuous state space via \(V(s) \approx \tilde{V}(s; \theta) \)
\(45^\circ\)
\(-45^\circ\)
\(\phi=0^\circ\)
Sunberg, Zachary N., Mykel J. Kochenderfer, and Marco Pavone. "Optimized and trusted collision avoidance for unmanned aerial vehicles using approximate dynamic programming." Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016.
Value Function
Policy
A better way: MPC Value Iteration
\(R\) quasi-convex (e.g. \(x_t \in \) safe landing)
Feasible sets convex
Job Talk
By Zachary Sunberg
Job Talk
- 448