Control with Partial Observation

ML in Feedback Sys #14

Fall 2025, Prof Sarah Dean

Control by Separation Principle

"What we do"

  • Given a state feedback policy \(\pi:\mathcal S\to\mathcal A\) (e.g. MPC) and a state estimation filtering algorithm
  • For \(t=1,2,...\)
    • Observe the output \(y_t\)
    • Update the state estimate to \(\hat s_{t|t}\) using \(y_t\) according to the filtering algorithm
    • Select an action \(a_t = \pi(\hat s_{t|t})\)
    • Update the state estimation \(\hat s_{t+1|t}\) using \(a_t\) according to the filtering algorithm

Control by Separation Principle

"Why we do it"

  • Fact 1: For partially observed linear-quadratic (LQ) control, the separation principle policy is optimal, i.e. $$\pi_t^\star(a_{0:t-1},y_{0:t}) = K_t^\star \mathbb E[s_t|a_{0:t-1},y_{0:t}] = \pi^{LQ}_t(\hat s_{t|t})$$
  • Fact 2: For LQ control with Gaussian noise (LQG), the Kalman filter is the optimal filtering algorithm
  • Fact 3: For LQG, the steady state optimal policy is described by control gains \(K_\star\) and filter gains \(L_\star\), resulting in linear time-invariant dynamics with   $$a_t = K_\star \hat s_{t},\quad  \hat s_{t+1} = F\hat s_t + G a_t + L_\star (y_{t+1}-H(F\hat s_t + Ga_t))  $$
  • Fact 4: The separation principle is convenient, but in general it is not optimal

Partially observed LQ control

  • Linear dynamics, linear observations, quadratic costs
  • Stochastic and independent noise $$\mathbb E[w_k] = 0,~~\mathbb E[w_kw_k^\top] =\sigma_w^2 I,~~\mathbb E[v_k] = 0,~~\mathbb E[v_kv_k^\top] = \sigma_v^2 I$$
  • Information structure: policy at time \(t\) can depend on $$\mathcal I_t =\{ y_0, a_0, y_1, a_1, ..., a_{t-1}, y_t\}$$

PO-LQ Optimal Control Problem

$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = F s_k+ Ga_k+w_k $$

$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$

$$y_k=Hs_k+v_k $$

Partially observed optimal control

Dynamic Programming Algorithm

  • Initialize \(J_{T+1}^\star (\mathcal I_{T+1}) = 0\)
  • For \(k=T,T-1,\dots,0\):
    • Compute \(J_k^\star (\mathcal I_{k}) = \min_{a\in\mathcal A} \mathbb E[c(s_k, a)+ J_{k+1}^\star (\mathcal I_{k+1})| \mathcal I_{k}, a ]\)
    • Record minimizing argument as \(\pi_k^\star(\mathcal I_{k})\)

Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

By the principle of optimality, the resulting policy is optimal.

Information state \(\mathcal I_k\) acts as our state

The information state evolves depending on \(a_k, w_k, v_{k+1}\)

$$\{ y_0, a_0, ..., a_{k-1}, y_k\} \to \{ y_0, a_0, ..., a_{k}, y_{k+1}\}$$

Dynamic Programming for PO-LQ

  • The cost-to-go has the form $$J^\star_k(\mathcal I_k) = \mathbb E\Big[s_k^\top P_k s_k + (s_k-\mu_k)^\top P_k(s_k-\mu_k) + w_k^\top Q w_k\Big|\mathcal I_k\Big]$$ where \(\mu_k = \mathbb E[s_k|\mathcal I_k]\)
  • Lemma 4.2.1: the difference \(s_k - \mu_k\) depends on \(s_0,w_0,w_1,...,w_{k-1},v_0,v_1,...,v_k\) and is entirely independent of actions
    • this is a separation between estimation and control
  • As a result, only the first term in the cost-to-go will depend on the action, so the DP iterations proceed exactly as in the state observed case

DP: \(J_k^\star (\mathcal I_{k}) = \min_{a\in\mathcal A} \mathbb E[c(s_k, a)+ J_{k+1}^\star (\mathcal I_{k+1})| \mathcal I_{k}, a ]\)

Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

Separation in PO-LQ control

Theorem:  The optimal policy is linear in the estimated state
\(\pi_t^\star(\mathcal I_t) = K_t \mathbb E[s_t|\mathcal I_t]\) and coincides with the
optimal state feedback policy

  • \(K_t = -(R+G^\top P_{t+1}G)^{-1}G^\top P_{t+1}F\)
  • \(P_t = Q+F^\top P_{t+1}F + F^\top P_{t+1}G(R+G^\top P_{t+1}G)^{-1}G^\top P_{t+1}F\)
  • Define as \(P_t = Ricc(P_{t+1}, F, G, Q, R)\)

Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

Linear Quadratic Gaussian

Fact from Lecture 8: if \(w_k\) and \(v_k\) are Gaussian random variables, then the Kalman filter computes the posterior distribution of the latent state $$P(s_t | y_0,...,y_t, a_0,..,a_{t-1}) =\mathcal N(\hat s_{t|t}, P_{t|t})$$

Kalman filter at \(t\)

  • Extrapolate \(\hat s_{t\mid t-1} =F\hat s_{t-1\mid t-1} +\textcolor{cyan}{Ga_{t-1}}\) $$P_{t\mid t-1} = FP_{t-1\mid t-1} F^\top + \Sigma_w$$
  • Compute gain \(L_{t} = P_{t\mid t-1}H^\top ( HP_{t\mid t-1} H^\top+\Sigma_v)^{-1}\)
  • Update \(\hat s_{t\mid t} = \hat s_{t\mid t-1}+ L_{t}(y_{t}-H\hat s_{t\mid t-1})\) $$P_{t\mid t} = (I - L_{t}H)P_{t\mid t-1}$$

Linear Quadratic Gaussian

Kalman filter at \(t\)

  • State estimate $$\hat s_{t\mid t} = (F\hat s_{t-1\mid t-1} +{Ga_{t-1}})+ L_{t}(y_{t}-H(F\hat s_{t-1\mid t-1} +{Ga_{t-1}}))$$
  • With gain \(L_{t} = P_{t\mid t-1}H^\top ( HP_{t\mid t-1} H^\top+\Sigma_v)^{-1}\)
  • where \(P_{t\mid t-1} = FP_{t-1\mid t-1} F^\top + \Sigma_w\) and \(P_{t\mid t} = (I - L_{t}H)P_{t\mid t-1}\)

Theorem:  The optimal policy is linear in the KF state
\(\pi_t^\star(\mathcal I_t) = K_t \hat s_{t|t}\) and coincides with the
optimal state feedback policy

  • \(K_t = -(R+G^\top P_{t+1}G)^{-1}G^\top P_{t+1}F\)
  • \(P_t = Ricc(P_{t+1}, F, G, Q, R)\)

Steady state LQG

  • Assume \((F,G)\) controllable, \((F,H)\) observable, \(Q,\Sigma_w\succ 0\)
  • Feedback policy steady state
    • A fixed point exists \(P_\star = Ricc(P_{\star}, F, G, Q, R)\)
    • If \(Q_T = P_\star\) or \(T\to\infty\) then \(K_\star = -(R+G^\top P_{\star}G)^{-1}G^\top QP_{\star}F\)
  • Kalman filter steady state
    • Combining predict and update equations, $$P_{t+1|t} = Ricc(P_{t|t-1}, F^\top, H^\top, \Sigma_w, \Sigma_v)$$
    • Similarly, a fixed point exists \(\Sigma_{\star} = Ricc(\Sigma_{\star}, F^\top, H^\top, \Sigma_w, \Sigma_v)\)
    • Fact: if \(P_{0|-1} = \Sigma_\star\) or \(t\to\infty\) then \(L_{\star} = F\Sigma_{\star}H^\top ( H\Sigma_{\star} H^\top+\Sigma_v)^{-1}\)
  • Fact 3: For LQG, the steady state optimal policy is described by control gains \(K_\star\) and filter gains \(L_\star\), resulting in LTI dynamics  $$a_t = K_\star \hat s_{t},\quad  \hat s_{t+1} = F\hat s_t + G a_t + L_\star (y_{t+1}-H(F\hat s_t + Ga_t))  $$

For fixed point of Riccati equation existence $$P_\star = Ricc(P_{\star}, F, G, Q, R)$$

  • \((F,G)\) is stabilizable
    • guaranteed if controllable
  • \((F,Q^{1/2})\) is detectable
    • guaranteed if observable

Steady state LQG

  • Assume \((F,G)\) controllable, \((F,H)\) observable, \(Q,\Sigma_w\succ 0\)
  • Feedback policy
    • \(P_\star = Ricc(P_{\star}, F, G, Q, R)\)
    • \(K_\star = -(R+G^\top P_{\star}G)^{-1}G^\top QP_{\star}F\)
  • Kalman filter
    • \(\Sigma_{\star} = Ricc(\Sigma_{\star}, F^\top, H^\top, \Sigma_w, \Sigma_v)\)
    • \(L_{\star} = F\Sigma_{\star}H^\top ( H\Sigma_{\star} H^\top+\Sigma_v)^{-1}\)
  • Fact 3: For LQG, the steady state optimal policy is described by control gains \(K_\star\) and filter gains \(L_\star\), resulting in LTI dynamics  $$a_t = K_\star \hat s_{t},\quad  \hat s_{t+1} = F\hat s_t + G a_t + L_\star (y_{t+1}-H(F\hat s_t + Ga_t))  $$

Failure of Separation Principle

  • In general, optimal policies do not exhibit the separation principle
  • Even in the simple linear general-cost Guassian case
    • KF still computes the state posterior \(\mathcal N(\hat s_{t|t}, P_{t|t})\)
  • Example: scalar \(s_{t+1} = s_t+a_t+w_t\) with quartic cost $$\min_a \mathbb E[s_{t+1}^4] + 2a^2$$
    • For a Gaussian \(\mathcal N(\mu, \sigma)\), we have that \(E[s^4] =\mu^4+6\mu^2\sigma^2 + 3\sigma^4\)
    • Then the minimization becomes $$\min_a (\hat s+a)^4+6(\hat s+a)^2\sigma^2 + 3\sigma^4+2a^2$$
    • So the optimal action satisfies $$a^3+3\hat s a^2+(3\hat s^2+3\sigma^2+1)a+(\hat s^3+3\hat s\sigma^2)=0$$
    • Numerically solving, we see effect of \(\sigma\) on optimal action

Failure of Separation Principle

  • In general, optimal policies do not exhibit the separation principle
  • Even in the simple linear general-cost Guassian case
    • KF still computes the state posterior \(\mathcal N(\hat s_{t|t}, P_{t|t})\)
  • Example: scalar \(s_{t+1} = s_t+a_t(1+w_t)\) with multiplicative noise for \(\sigma_w=1\) and quartic cost $$\min_a \mathbb E[s_{t+1}^4] + 2a^2$$
    • For a Gaussian \(\mathcal N(\mu, \sigma)\), we have that \(E[s^4] =\mu^4+6\mu^2\sigma^2 + 3\sigma^4\)
    • Then the minimization becomes $$\min_a (\hat s+a)^4+6(\hat s+a)^2(p^2+a^2) + 3(p^2+a^2)^2+2a^2$$
    • Numerically solving, we see even more complex effects

Classes of Policies

  1. Closed Loop: a policy choosing actions with knowledge of \(F,H,c\) and noise characteristics over the full horizon \(0,..., T\)
  2. Feedback: a policy choosing actions at \(t\) with knowledge of
    1. \(F,c\) and process noise characteristics over the full horizon
    2. \(H\) and measurement noise characteristics over the horizon \(0,...,t\)
    • Certainty Equivalent: a policy depending only on the estimated state
    • contrast with Uncertainty Aware: a policy depending on posterior state distribution
  3. Open Loop: a policy choosing actions with knowledge of \(F,c\) over the full horizon but no knowledge about \(H\) or measurement noise

Stochastic Optimal Control Problem

$$ \min_{\pi_{0:T}}~~ \mathbb E_{w,v}\Big[\sum_{k=0}^{T} c(s_k, a_k) \Big ]\quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$

$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$

$$y_k=H(s_k,v_k) $$

Classes of Policies

  1. Closed Loop: gold standard for optimality, may exhibit "probing" or "active learning" behavior to reduce uncertainties
  2. Feedback: common in practice (especially certainty equivalent), may be overly cautious (especially uncertainty aware) due to lack of ability to consider uncertainty reduction
  3. Open Loop: "eyes closed" policy does not respond to new observations

Stochastic Optimal Control Problem

$$ \min_{\pi_{0:T}}~~ \mathbb E_{w,v}\Big[\sum_{k=0}^{T} c(s_k, a_k) \Big ]\quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$

$$a_k=\pi_k(a_{0:k-1}, y_{0:k}) $$

$$y_k=H(s_k,v_k) $$

Reference: Ch 4 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas

Recap

  • Separation principle
  • Steady state LQG

Next time: adaptive control

Announcements

  • Project proposal due tomorrow, no assignment over Fall break