Backup Slides
Approximate Value Iteration
\[\tilde{V}(s) = \beta(s)^\top \theta\]
\[\tilde{V}_{k+1}(s) = \Pi \mathcal{B}[\tilde{V}_k](s)\]
At each iteration, sample \(N_\text{state}\) states and estimate the value with
\[v_{k+1}[n] \gets \max_{a \in \mathcal{A}} \left\{R(s^{[n]},a) + \frac{1}{N_{\text{EV}}}\sum_{m=1}^{N_{\text{EV}}} \beta(F(s^{[n]}, a, w_m))^\top \theta_k \right\} \text{,}\]
then, project onto the linear subspace with
\[\theta_{k+1} = \text{argmin} \sum_{n=1}^{N_\text{state}} \left( \beta\left(s^{[n]}\right)^\top \theta - v_{k+1}[n] \right) ^2 \text{.}\]
Lane changing with internal states
“merging into traffic during rush hour is an exercise in negotiation”
— Google Self-Driving Car Project Monthly Report, Sept. 2016
“self-driving vehicles were not at fault in any crashes they were involved in.”
XXX Credit
Assume normal
Outcome only
Omniscient
Mean MPC
QMDP
POMCPOW
Defense Backup
By Zachary Sunberg
Defense Backup
- 449