Backup Slides

Approximate Value Iteration

\[\tilde{V}(s) = \beta(s)^\top \theta\]


\[\tilde{V}_{k+1}(s) = \Pi \mathcal{B}[\tilde{V}_k](s)\]


At each iteration, sample \(N_\text{state}\) states and estimate the value with

\[v_{k+1}[n] \gets \max_{a \in \mathcal{A}} \left\{R(s^{[n]},a) + \frac{1}{N_{\text{EV}}}\sum_{m=1}^{N_{\text{EV}}} \beta(F(s^{[n]}, a, w_m))^\top \theta_k \right\} \text{,}\]

then, project onto the linear subspace with

\[\theta_{k+1} = \text{argmin} \sum_{n=1}^{N_\text{state}} \left( \beta\left(s^{[n]}\right)^\top \theta - v_{k+1}[n] \right) ^2 \text{.}\]

Lane changing with internal states

“merging into traffic during rush hour is an exercise in negotiation”

— Google Self-Driving Car Project Monthly Report, Sept. 2016

“self-driving vehicles were not at fault in any crashes they were involved in.”

XXX Credit

Assume normal

Outcome only

Omniscient

Mean MPC

QMDP

POMCPOW

Defense Backup

By Zachary Sunberg

Defense Backup

  • 449