Approximate Value Iteration
\[\tilde{V}(s) = \beta(s)^\top \theta\]
\[\tilde{V}_{k+1}(s) = \Pi \mathcal{B}[\tilde{V}_k](s)\]
At each iteration, sample \(N_\text{state}\) states and estimate the value with
\[v_{k+1}[n] \gets \max_{a \in \mathcal{A}} \left\{R(s^{[n]},a) + \frac{1}{N_{\text{EV}}}\sum_{m=1}^{N_{\text{EV}}} \beta(F(s^{[n]}, a, w_m))^\top \theta_k \right\} \text{,}\]
then, project onto the linear subspace with
\[\theta_{k+1} = \text{argmin} \sum_{n=1}^{N_\text{state}} \left( \beta\left(s^{[n]}\right)^\top \theta - v_{k+1}[n] \right) ^2 \text{.}\]
Lane changing with internal states
“merging into traffic during rush hour is an exercise in negotiation”
— Google Self-Driving Car Project Monthly Report, Sept. 2016
“self-driving vehicles were not at fault in any crashes they were involved in.”
XXX Credit
Assume normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW