Model Predictive Control
ML in Feedback Sys #21
Prof Sarah Dean
Reminders/etc
 Project midterm update due Friday!
 Scribing feedback to come by next week

Upcoming paper presentations starting next week
 RB17 Frank, WeiHan, Minjae
 DSA+20 Jerry, Wendy, Yueying
 Participation includes attending presentations
 Relevant AI Seminar tomorrow 12:151:15 in Gates 122: Brandon Amos, Learning with differentiable and amortized optimization
policy
\(\pi_t:\mathcal S\to\mathcal A\)
observation
\(s_t\)
accumulate
\(\{(s_t, a_t, c_t)\}\)
Safe action in a dynamic world
Goal: select actions \(a_t\) to bring environment to lowcost states
while avoiding unsafe states
action
\(a_{t}\)
\(F\)
\(s\)
Recap: Invariant Sets
 A set \(\mathcal S_\mathrm{inv}\) is invariant under dynamics \(s_{t+1} = F(s_t)\) if for all \( s\in\mathcal S_\mathrm{inv}\), \( F(s)\in\mathcal S_\mathrm{inv}\)
 If \(\mathcal S_\mathrm{inv}\) is invariant for dynamics \(F\), then \(s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}\) for all \(t\).
 Example: sublevel set of Lyapunov function
 \(\{s\mid V(s)\leq c\}\)
Recap: Receding Horizon
time
Do
Plan
Do
Plan
Do
Plan
 For \(t=0,1,\dots\)
 Observe state
 Optimize plan
 Apply first planned action
\(\pi(s_t) = u_0^\star(s_t)\)
$$\min_{u_0,\dots, u_{H1}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k})$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~\)
Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).
\([u_0^\star,\dots, u_{H1}^\star](s_t) = \arg\)
Recap: The MPC Policy
$$\min_{u_0,\dots, u_{H1}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k})$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~\)
Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).
Recap: The MPC Policy
\(F\)
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
Example: infeasibility
The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\ 1 \end{bmatrix}a_t\)
Goal: stay near origin and be energy efficient
 Safety constraint \(\theta\leq 1\) and actuation limit \(a\leq 0.5\)
 Infeasibility = inability to guarantee safety
 also leads to loss of stability
 States that are initially feasible vs. states that remain feasible
 not different when plan is over \(H=T\)
Infeasibility Problem
 infeasible
 initially feasible
 remain feasible
 Infeasibility = inability to guarantee safety
 also leads to loss of stability
 States that are initially feasible vs. states that remain feasible
 not different when plan is over (\(H=T\))

Definition: We call \(\mathcal S_\mathrm{inv}\) a control invariant set for dynamics
\(s_{t+1} = F(s_t, a_t)\) if for all \( s\in\mathcal S_\mathrm{inv}\), there exists an \(a\in\mathcal A_\mathrm{safe}\) such that \( F(s, a)\in\mathcal S_\mathrm{inv}\) 
Definition: The region of attraction for dynamics \(\tilde F(s)\) is the set of initial states \(s_0\) that converge to the origin.
 We consider the closed loop dynamics \(\tilde F(s) = F(s,\pi(s))\)
Infeasibility Problem
$$\min_{u_0,\dots, u_{H1}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}\)
 Let \(J(s; u_0,\dots, u_{H1})\) be the value of the objective for actions \(u_0,\dots, u_{H1}\)
 Let \(J^\star(s)=J(s; u^\star_0,\dots, u^\star_{H1})\) be the optimal value.
 Assume that stage cost \(c(s,a)\) is positive definite, i.e. \(c(s,a)>0\) for all \(s,a\neq 0\) and \(c(0,0)=0\).
 Assume that \(0\in\mathcal S_\mathrm{safe}\) and \(0\in\mathcal A_\mathrm{safe}\)
Terminal cost and constraints
 Receding horizon control is short sighted
 Additional terms more closely approximate infinite horizon problem
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k}) $$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}\)
 Assume that the origin is a fixed point of the uncontrolled dynamics
 i.e. \(F(0,0)=0\)
 We will prove that
 \(\pi_\mathrm{MPC}\) is recursively feasible
 \(F(s, \pi_\mathrm{MPC}(s))\) is stable within region of attraction
Terminal cost and constraints
 Warm up: consider \(\mathcal S_H = \{0\}\) and \(c_H(0)=0\)
Recursive feasibility: feasible at \(s_t\implies\) feasible at \(s_{t+1}\)
Proof:
 \(s_t\) feasible and solution to optimization problem is \(u^\star_{0}, \dots, u^\star_{H1}\) with corresponding states \(x^\star_{0}, \dots, x^\star_{H}\)
 After applying \(a_t=u^\star_{0}\), state moves to \(s_{t+1} = F(s_t,a_t)\)
 Notice that \(s_{t+1} = x^\star_1\)
 Claim: \(u^\star_{1}, \dots, u^\star_{H1}, 0\) is now a feasible solution.
 because \(x_{H}^\star=0\) and \(F(0,0)=0\)
 thus corresponding states \(x^\star_{1}, \dots, x^\star_{H}, 0\) satisfy constraints
Recursive Feasibility
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k})\qquad\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$$
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}\)
Review: Lyapunov Stability
Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and
 (positive definite) \(V(0)=0\) and \(V(0)>0\) for all \(s\in\mathcal S  \{0\}\)
 (decreasing) \(V(F(s))  V(s) \leq 0\) for all \(s\in\mathcal S\)
 Optionally, (strict) \(V(F(s))  V(s) < 0\) for all \(s\in\mathcal S\{0\}\)
Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\) is
 asymptotically stable if \(V\) is strictly decreasing
Proof:
\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.
 Positive definite:
 if \(s=0\), then optimal actions are \(0\) since \(F(0,0)=0\) and stage cost is positive definite
 if \(s\neq 0\), \(J^\star(s)>0\) since stage cost is positive definite
Stability
 Strictly decreasing: recall \(J^\star (s_t) =\sum_{k=0}^{H1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)\)
 \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
 \(=J(s_{t+1}; u^\star_1,\dots, u^\star_{H1}, 0)=\sum_{k=1}^{H1} c(x^\star_{k}, u^\star_{k}) + c(x^\star_{H}, 0)+c_H(0)\)
 \(=\sum_{k=0}^{H1} c(x^\star_{k}, u^\star_{k}) + \cancel{c(x^\star_{H}, 0)}+\cancel{c_H(0)} c(x^\star_{0}, u^\star_{0}) \)
 \(= J^\star (s) c(x^\star_{0}, u^\star_{0}) < J^\star (s)\)
 \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}\)
General terminal cost and constraints
Assumptions:
 The origin is uncontrolled fixed pointed \(F(0,0)=0\)
 The costs are positive definite and constraints contain \(0\)
 The terminal set is contained in safe set and is control invariant
 The terminal cost satisfies $$ c_H(s_{t+1})  c_H(s_t) \leq c(s_t, u) $$ for some \(u\) such that \(s_{t+1} = F(s_{t}, u)\in\mathcal S_H\)
Recursive feasibility: feasible at \(s_t\implies\) feasible at \(s_{t+1}\)
Proof:
 \(s_t\) feasible and solution to optimization problem is \(u^\star_{0}, \dots, u^\star_{H1}\) with corresponding states \(x^\star_{0}, \dots, x^\star_{H}\)
 After applying \(a_t=u^\star_{0}\), state moves to \(s_{t+1} = F(s_t,a_t)\)
 Notice that \(s_{t+1} = x^\star_1\)
 Claim: there exists a \(u\) such that \(u^\star_{1}, \dots, u^\star_{H1}, u\) is a feasible solution.
 because \(x_{H}^\star\in\mathcal S_H\) and \(\mathcal S_H\) is control invariant
 thus corresponding states \(x^\star_{1}, \dots, x^\star_{H}, F(x^\star_{H}, u)\) satisfy constraints
Recursive Feasibility
Proof:
\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.
Stability
 Positive definite: same argument as before
 Strictly decreasing: recall \(J^\star (s_t) =\sum_{k=0}^{H1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)\)
 \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
 \(=J(s_{t+1}; u^\star_1,\dots, u^\star_{H1}, u)\)
 \(=\sum_{k=1}^{H1} c(x^\star_{k}, u^\star_{k}) + c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))\)
 \(=\sum_{k=0}^{H1} c(x^\star_{k}, u^\star_{k})+c_H(x^\star_H)+ c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))c_H(x^\star_H) c(x^\star_{0}, u^\star_{0}) \)
 \(\leq J^\star (s) +c(x^\star_{H}, u)  c(x^\star_{H}, u) c(x^\star_{0}, u^\star_{0}) < J^\star (s)\)
 \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
Terminal cost and constraints for LQR
Based on unconstrained LQR policy where \(P=\mathrm{DARE}(A,B,Q,R)\) $$ K=(B^\top PB+R)^{1}B^\top P$$
 Terminal cost as \(c_H(s) = s^\top P s\)
 Terminal set is any invariant set for closed loop $$s\in\mathcal S_H\implies (A+BK)s\in\mathcal S_H$$ which also guarantees safety: $$\mathcal S_H\subseteq \mathcal S_\mathrm{safe},\quad Ks\in\mathcal A_\mathrm{safe}\quad\forall~~s\in\mathcal S_H$$
 ex: sublevel set of \(s^\top P s\)
Constrained LQR Problem
$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t \\ G_s s_t\leq b_s,G_a a_t\leq b_a$$
MPC Policy
$$ \min ~~\sum_{k=0}^{H1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k\\G_s x_k\leq b_s,G_a u_k\leq b_a ,x_H\in\mathcal S_H$$
Terminal cost and constraints for LQR
This satisfies the assumptions:
 The origin is uncontrolled fixed pointed \(F(0,0)=0\)
 ✓ \(A\cdot 0+B\cdot 0=0\)
 The costs are positive definite and constraints contain \(0\)
 ✓ \(Q,R,P\) are psd, assume \(b_s,b_a\geq 0\)
 The terminal set is contained in safe set and is control invariant
 ✓ by construction, have \(u=Ks\) guarantees invariance
 The terminal cost satisfies \(c_H(s_{t+1})  c_H(s_t) \leq c(s_t, u) \) for some \(u\) such that \(s_{t+1} = F(s_{t}, u)\in\mathcal S_H\)
 Exercise: use the form of \(K\) and the DARE to show that $$((A+BK)s)^\top P(A+BK)s  s^\top Ps = s^\top Q s$$
 Recall the Bellman Optimality Equation:
 \( \pi^\star(s) = \arg\min_{a\in\mathcal A} c(s, a)+J^\star (F(s,a))\)
 For LQR, this means that
 \(\pi^\star(s) = \arg\min_a s^\top Q s + a^\top R a + (As+Ba)^\top P (As+Ba)\)
 \(\pi^\star(s) = \arg\min_u x_0^\top Q x_0 + u^\top R u + x_1^\top P x_1~~\text{s.t.} ~~x_0=s,~~x_1=Ax_0+Bu\)

This is MPC with \(H=1\) and correct terminal cost!
Equivalence for unconstrained
Constrained LQR Problem
$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t $$
MPC Policy
$$ \min ~~\sum_{k=0}^{H1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k$$
$$\min_{u_0,\dots, u_{H1}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k}) + c_H(x_H)$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)
 Terminal constraint not often used (instead: long horizon)
 Soft constraints
 \(x_k+\delta \in\mathcal S_\mathrm{safe}\) and add penalty \(C\\delta\_2^2\) to cost
 Accuracy of costs/dynamics vs. ease of optimization
 Sampling based optimization (cross entropy method)
MPC in practice
\(F\)
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
$$\min_{u_0,\dots, u_{H1}} \quad\sum_{k=0}^{H1} c(x_{k}, u_{k}) + c_H(x_H)$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)
 Disturbances:
 optimize expectation, high probability, or worstcase
 Unknown dynamics/costs
 robust to uncertainty (worst case)
 learn from data
MPC extensions
\(F\)
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
Recap
 Recap: MPC
 Feasibility problems
 Terminal sets and costs
 Proof of feasibility and stability
References: Predictive Control by Borrelli, Bemporad, Morari
Reminders
 Project update due Friday
 Upcoming paper presentations:

[RB17] Learning model predictive control for iterative tasks

[DSA+20] Fairness is not static

[FLD21] Algorithmic fairness and the situated dynamics of justice

22  Model Predictive Control  ML in Feedback Sys
By Sarah Dean