Fall 2025, Prof Sarah Dean
"What we do"
"Why we do it"
PO-LQ Optimal Control Problem
$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = F s_k+ Ga_k+w_k $$
$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$
$$y_k=Hs_k+v_k $$
Dynamic Programming Algorithm
Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
By the principle of optimality, the resulting policy is optimal.
Information state \(\mathcal I_k\) acts as our state
The information state evolves depending on \(a_k, w_k, v_{k+1}\)
$$\{ y_0, a_0, ..., a_{k-1}, y_k\} \to \{ y_0, a_0, ..., a_{k}, y_{k+1}\}$$
DP: \(J_k^\star (\mathcal I_{k}) = \min_{a\in\mathcal A} \mathbb E[c(s_k, a)+ J_{k+1}^\star (\mathcal I_{k+1})| \mathcal I_{k}, a ]\)
Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
Theorem: The optimal policy is linear in the estimated state
\(\pi_t^\star(\mathcal I_t) = K_t \mathbb E[s_t|\mathcal I_t]\) and coincides with the
optimal state feedback policy
Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
Fact from Lecture 8: if \(w_k\) and \(v_k\) are Gaussian random variables, then the Kalman filter computes the posterior distribution of the latent state $$P(s_t | y_0,...,y_t, a_0,..,a_{t-1}) =\mathcal N(\hat s_{t|t}, P_{t|t})$$
Kalman filter at \(t\)
Kalman filter at \(t\)
Theorem: The optimal policy is linear in the KF state
\(\pi_t^\star(\mathcal I_t) = K_t \hat s_{t|t}\) and coincides with the
optimal state feedback policy
For fixed point of Riccati equation existence $$P_\star = Ricc(P_{\star}, F, G, Q, R)$$
Stochastic Optimal Control Problem
$$ \min_{\pi_{0:T}}~~ \mathbb E_{w,v}\Big[\sum_{k=0}^{T} c(s_k, a_k) \Big ]\quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$
Tse & Bar-Shalom, Information patterns and classes of stochastic control laws, 1973.
$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$
$$y_k=H(s_k,v_k) $$
Stochastic Optimal Control Problem
$$ \min_{\pi_{0:T}}~~ \mathbb E_{w,v}\Big[\sum_{k=0}^{T} c(s_k, a_k) \Big ]\quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$
Tse & Bar-Shalom, Information patterns and classes of stochastic control laws, 1973.
$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$
$$y_k=H(s_k,v_k) $$
Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
Next time: adaptive control