Fall 2025, Prof Sarah Dean
policy
\(\pi_t:\mathcal S\to\mathcal A\)
observation
\(s_t\)
accumulate
\(\{(s_t, a_t, c_t)\}\)
Goal: select actions \(a_t\) to bring environment to low-cost states
action
\(a_{t}\)
\(s\)
"What we do"
\( \underset{a_0,\dots,a_H }{\min}\) \(\displaystyle\sum_{k=0}^H c(s_k, a_k)\)
\(\text{s.t.}~~s_0=s_t,~~s_{k+1} = F(s_k, a_k)\)
\([a_0^\star,\dots, a_{H}^\star](s_t) = \arg\)
model predicts the (action-dependent) trajectory
"Why we do it"
Figure from slides by Borelli, Jones, Morari
Plan:
Figure from slides by Borelli, Jones, Morari
Plan:
$$ \min_{a_{0:T}} ~~\sum_{k=0}^{T} c(s_k,a_k) \quad \text{s.t}\quad s_{k+1} = F s_k+ Ga_k $$
s_vec = cvx.Variable((T*n, 1), name="s")
a_vec = cvx.Variable((T*p, 1), name="a")
# Linear dynamics constraint
constr = [s_vec[:n, :] == s_0]
for k in range(T):
constr.append(s_vec[n*(k+1):n*(k+1+1)] == F*s_vec[n*k:n*(k+1),:] + G*a_vec[p*k:p*(k+1),:])
# Convex cost
objective = cost(s_vec, a_vec)
prob = cvx.Problem(cvx.Minimize(objective), constr)
prob.solve()
actions = np.array(a_vec.value)
map of how actions affect states
Figures from slides by Goulart, Borelli
$$ \min_{a_{0:T}} ~~\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \quad \text{s.t}\quad s_{k+1} = F s_k+ Ga_k $$
Fact 2: When costs are quadratic and dynamics are linear, MPC selects an action which depends linearly on the state. $$a_t^{MPC}=K_{MPC}s_t$$
\(s\)
\(s_t\)
\(a_t = a_0^\star(s_t)\)
\( \underset{a_0,\dots,a_H }{\min}\) \(\displaystyle\sum_{k=0}^H c(s_k, a_k)\)
\(\text{s.t.}~~s_0=s_t,~~s_{k+1} = F(s_k, a_k)\)
Fact 2: When costs are quadratic and dynamics are linear, MPC selects an action which depends linearly on the state. $$a_t^{MPC}=K_{MPC}s_t$$
\(a_t\)
Claim: MPC policy is linear \(\pi_t^\star(s) = \gamma^\mathsf{pos} \mathsf{pos}_t + \gamma^\mathsf{vel} \mathsf{vel}_t\)
Optimal Control Problem
$$ \min_{a_{0:T}} \sum_{k=0}^{T} c(s_k, a_k) \quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$
Stochastic Optimal Control Problem
$$ \min_{\pi_{0:T}}~~ \mathbb E_w\Big[\sum_{k=0}^{T} c(s_k, a_k) \Big ]\quad \text{s.t}\quad s_0~~\text{given},~~ s_{k+1} = F(s_k, a_k,w_k) $$
$$a_k=\pi_k(s_k) $$
Denote the objective value as \(J^\pi(s_0)\)
Suppose \(\pi_\star = (\pi^\star_0,\dots \pi^\star_{T})\) minimizes the optimal control problem
Then the cost-to-go $$ J^\pi_t(s) = \mathbb E_w\Big[\sum_{k=t}^{T} c(s_k, \pi_k(s_k)) \Big]\quad \text{s.t}\quad s_t=s,~~s_{k+1} = F(s_k, \pi_k(s_k),w_k) $$
is minimized for all \(s\) by the truncated policy \((\pi_t^\star,\dots\pi_T^\star)\)
(i.e. \(J^\pi(s)\geq J^{\pi^\star}(s)\) for all \(\pi, s\))
Algorithm
Reference: Ch 1 in Dynamic Programming & Optimal Control, Vol. I by Bertsekas
By the principle of optimality, the resulting policy is optimal.
\(a_t\)
Claim: optimal policy is linear \(\pi_t^\star(s) = \gamma^\mathsf{pos}_t \mathsf{pos}_t + \gamma_t^\mathsf{vel} \mathsf{vel}_t\)
\(\gamma^\mathsf{pos}\)
\(\gamma^\mathsf{vel}\)
\(-1\)
\(t\)
\(H\)
LQR Problem
$$ \min_{\pi_{0:T}} ~~\mathbb E_w\Big[\sum_{k=0}^{T} s_k^\top Qs_k + a_k^\top Ra_k \Big]\quad \text{s.t}\quad s_{k+1} = F s_k+ Ga_k+w_k $$
$$a_k=\pi_k(s_k) $$
DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)
DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)
DP: \(J_k^\star (s) = \min_{a\in\mathcal A} c(s, a)+\mathbb E_w[J_{k+1}^\star (F(s,a,w))]\)
Theorem: For \(t=0,\dots T\), the optimal cost-to-go function is quadratic and the optimal policy is linear
Theorem: For \(t=0,\dots T\), the optimal cost-to-go function is quadratic and the optimal policy is linear
LQR Problem
$$ \min ~~\mathbb E\Big[\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\Big]\quad\\ \text{s.t}\quad s_{t+1} = F s_t+ Ga_t+w_t$$
We know that \(a^\star_t = \pi_t^\star(s_t)\) where \(\pi_t^\star(s) = K_t s\) and
MPC Problem
$$ \min ~~\sum_{k=0}^{H} s_k^\top Qs_k + a_k^\top Ra_k \quad \\ \text{s.t}\quad s_0=s,\quad s_{k+1} = F s_k+ Ga_k $$
MPC Policy \(a_t = a^\star_0(s_t)\) where
\(a^\star_0(s) = K_0s\) and
LQR Problem
We know that \(a^\star_t = \pi_t^\star(s_t)\) where \(\pi_t^\star(s) = K_t s\) and
MPC Problem
MPC Policy \(a_t = a^\star_0(s_t)\) where
\(a^\star_0(s) = K_0s\) and
Reference: Dynamic Programming & Optimal Control, Vol. I by Bertsekas
Next time: safety constraints