Backup Slides

Approximate Value Iteration

\[\tilde{V}(s) = \beta(s)^\top \theta\]


\[\tilde{V}_{k+1}(s) = \Pi \mathcal{B}[\tilde{V}_k](s)\]


At each iteration, sample \(N_\text{state}\) states and estimate the value with

\[v_{k+1}[n] \gets \max_{a \in \mathcal{A}} \left\{R(s^{[n]},a) + \frac{1}{N_{\text{EV}}}\sum_{m=1}^{N_{\text{EV}}} \beta(F(s^{[n]}, a, w_m))^\top \theta_k \right\} \text{,}\]

then, project onto the linear subspace with

\[\theta_{k+1} = \text{argmin} \sum_{n=1}^{N_\text{state}} \left( \beta\left(s^{[n]}\right)^\top \theta - v_{k+1}[n] \right) ^2 \text{.}\]

Lane changing with internal states

“merging into traffic during rush hour is an exercise in negotiation”

— Google Self-Driving Car Project Monthly Report, Sept. 2016

“self-driving vehicles were not at fault in any crashes they were involved in.”

XXX Credit

Assume normal

Outcome only

Omniscient

Mean MPC

QMDP

POMCPOW