Model Predictive Control

ML in Feedback Sys #21

Prof Sarah Dean

Reminders/etc

  • Project midterm update due Friday!
  • Scribing feedback to come by next week
  • Upcoming paper presentations starting next week
    • RB17 Frank, Wei-Han, Minjae
    • DSA+20 Jerry, Wendy, Yueying
  • Participation includes attending presentations
  • Relevant AI Seminar tomorrow 12:15-1:15 in Gates 122: Brandon Amos, Learning with differentiable and amortized optimization

policy

\(\pi_t:\mathcal S\to\mathcal A\)

observation

\(s_t\)

accumulate

\(\{(s_t, a_t, c_t)\}\)

Safe action in a dynamic world

Goal: select actions \(a_t\) to bring environment to low-cost states
while avoiding unsafe states

action

\(a_{t}\)

\(F\)

\(s\)

Recap: Invariant Sets

  • A set \(\mathcal S_\mathrm{inv}\) is invariant under dynamics \(s_{t+1} = F(s_t)\) if for all \( s\in\mathcal S_\mathrm{inv}\), \( F(s)\in\mathcal S_\mathrm{inv}\)
  • If \(\mathcal S_\mathrm{inv}\) is invariant for dynamics \(F\), then \(s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}\) for all \(t\).
  • Example: sublevel set of Lyapunov function
    • \(\{s\mid V(s)\leq c\}\)

Recap: Receding Horizon

time

Do

Plan

Do

Plan

Do

Plan

  • For \(t=0,1,\dots\)
    1. Observe state
    2. Optimize plan
    3. Apply first planned action

\(\pi(s_t) = u_0^\star(s_t)\)

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe}\quad~~~\)

Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).

\([u_0^\star,\dots, u_{H-1}^\star](s_t) = \arg\)

Recap: The MPC Policy

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe}\quad~~~\)

Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).

Recap: The MPC Policy

\(F\)

\(s\)

\(s_t\)

\(a_t = u_0^\star(s_t)\)

Example: infeasibility

The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\  1 \end{bmatrix}a_t\)

Goal: stay near origin and be energy efficient

  • Safety constraint \(|\theta|\leq 1\) and actuation limit \(|a|\leq 0.5\)
  • Infeasibility = inability to guarantee safety
    • also leads to loss of stability
  • States that are initially feasible vs. states that remain feasible
    • not different when plan is over \(H=T\)

Infeasibility Problem

  • infeasible
  • initially feasible
  • remain feasible
  • Infeasibility = inability to guarantee safety
    • also leads to loss of stability
  • States that are initially feasible vs. states that remain feasible
    • not different when plan is over (\(H=T\))
  • Definition: We call \(\mathcal S_\mathrm{inv}\) a control invariant set for dynamics
    \(s_{t+1} = F(s_t, a_t)\) if for all \( s\in\mathcal S_\mathrm{inv}\), there exists an \(a\in\mathcal A_\mathrm{safe}\) such that \( F(s, a)\in\mathcal S_\mathrm{inv}\)
  • Definition: The region of attraction for dynamics \(\tilde F(s)\) is the set of initial states \(s_0\) that converge to the origin.
    • We consider the closed loop dynamics \(\tilde F(s) = F(s,\pi(s))\)

Infeasibility Problem

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$

\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe},\quad  \textcolor{cyan}{x_H\in\mathcal S_H}\)

  • Let \(J(s; u_0,\dots, u_{H-1})\) be the value of the objective for actions \(u_0,\dots, u_{H-1}\)
  • Let \(J^\star(s)=J(s; u^\star_0,\dots, u^\star_{H-1})\) be the optimal value.
  • Assume that stage cost \(c(s,a)\) is positive definite, i.e. \(c(s,a)>0\) for all \(s,a\neq 0\) and \(c(0,0)=0\).
  • Assume that \(0\in\mathcal S_\mathrm{safe}\) and \(0\in\mathcal A_\mathrm{safe}\)

Terminal cost and constraints

  • Receding horizon control is short sighted
  • Additional terms more closely approximate infinite horizon problem

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) $$

\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe},\quad  \textcolor{cyan}{x_H=0}\)

  • Assume that the origin is a fixed point of the uncontrolled dynamics
    • i.e. \(F(0,0)=0\)
  • We will prove that
    • \(\pi_\mathrm{MPC}\) is recursively feasible
    • \(F(s, \pi_\mathrm{MPC}(s))\) is stable within region of attraction

Terminal cost and constraints

  • Warm up: consider \(\mathcal S_H = \{0\}\) and \(c_H(0)=0\)

Recursive feasibility: feasible at \(s_t\implies\) feasible  at \(s_{t+1}\)

Proof:

  1. \(s_t\) feasible and solution to optimization problem is \(u^\star_{0}, \dots, u^\star_{H-1}\) with corresponding states \(x^\star_{0}, \dots, x^\star_{H}\)
  2. After applying \(a_t=u^\star_{0}\), state moves to \(s_{t+1} = F(s_t,a_t)\)
    • Notice that \(s_{t+1} = x^\star_1\)
  3. Claim: \(u^\star_{1}, \dots, u^\star_{H-1}, 0\) is now a feasible solution.
    • because \(x_{H}^\star=0\) and \(F(0,0)=0\)
    • thus corresponding states \(x^\star_{1}, \dots, x^\star_{H}, 0\) satisfy constraints

Recursive Feasibility

 

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})\qquad\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$$

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe},\quad  \textcolor{cyan}{x_H=0}\)

Review: Lyapunov Stability

Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and

  • (positive definite) \(V(0)=0\) and \(V(0)>0\) for all \(s\in\mathcal S - \{0\}\)
  • (decreasing) \(V(F(s)) - V(s) \leq 0\) for all \(s\in\mathcal S\)
  • Optionally, (strict) \(V(F(s)) - V(s) < 0\) for all \(s\in\mathcal S-\{0\}\)

Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\)  is

  • asymptotically stable if \(V\) is strictly decreasing

Proof:

\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.

  • Positive definite:
    • if \(s=0\), then optimal actions are \(0\) since \(F(0,0)=0\) and stage cost is positive definite
    • if \(s\neq 0\), \(J^\star(s)>0\) since stage cost is positive definite

Stability

  • Strictly decreasing: recall \(J^\star (s_t) =\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)\)
    • \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
      • \(=J(s_{t+1}; u^\star_1,\dots, u^\star_{H-1}, 0)=\sum_{k=1}^{H-1} c(x^\star_{k}, u^\star_{k})  + c(x^\star_{H}, 0)+c_H(0)\)
      • \(=\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k})  + \cancel{c(x^\star_{H}, 0)}+\cancel{c_H(0)}   -c(x^\star_{0}, u^\star_{0}) \)
      • \(= J^\star (s) -c(x^\star_{0}, u^\star_{0}) < J^\star (s)\)

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$

\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe},\quad  \textcolor{cyan}{x_H\in\mathcal S_H}\)

General terminal cost and constraints

Assumptions:

  • The origin is uncontrolled fixed pointed \(F(0,0)=0\)
  • The costs are positive definite and constraints contain \(0\)
  • The terminal set is contained in safe set and is control invariant
  • The terminal cost satisfies $$ c_H(s_{t+1}) - c_H(s_t) \leq -c(s_t, u) $$ for some \(u\) such that \(s_{t+1} = F(s_{t}, u)\in\mathcal S_H\)

Recursive feasibility: feasible at \(s_t\implies\) feasible at \(s_{t+1}\)

Proof:

  1. \(s_t\) feasible and solution to optimization problem is \(u^\star_{0}, \dots, u^\star_{H-1}\) with corresponding states \(x^\star_{0}, \dots, x^\star_{H}\)
  2. After applying \(a_t=u^\star_{0}\), state moves to \(s_{t+1} = F(s_t,a_t)\)
    • Notice that \(s_{t+1} = x^\star_1\)
  3. Claim: there exists a \(u\) such that \(u^\star_{1}, \dots, u^\star_{H-1}, u\) is a feasible solution.
    • because \(x_{H}^\star\in\mathcal S_H\) and \(\mathcal S_H\) is control invariant
    • thus corresponding states \(x^\star_{1}, \dots, x^\star_{H}, F(x^\star_{H}, u)\) satisfy constraints

Recursive Feasibility

Proof:

\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.

Stability

  • Positive definite: same argument as before
  • Strictly decreasing: recall \(J^\star (s_t) =\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)\)
    • \(J^\star(s_{t+1}) \leq\) cost of feasible solution starting at \(x_0=F(s_t, u^\star_{0})\)
      • \(=J(s_{t+1}; u^\star_1,\dots, u^\star_{H-1}, u)\)
      • \(=\sum_{k=1}^{H-1} c(x^\star_{k}, u^\star_{k})  + c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))\)
      • \(=\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k})+c_H(x^\star_H)+ c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))-c_H(x^\star_H) -c(x^\star_{0}, u^\star_{0}) \)
      • \(\leq  J^\star (s) +c(x^\star_{H}, u) - c(x^\star_{H}, u) -c(x^\star_{0}, u^\star_{0}) < J^\star (s)\)

Terminal cost and constraints for LQR

Based on unconstrained LQR policy where \(P=\mathrm{DARE}(A,B,Q,R)\) $$ K=-(B^\top PB+R)^{-1}B^\top P$$

  • Terminal cost as \(c_H(s) = s^\top P s\)
  • Terminal set is any invariant set for closed loop $$s\in\mathcal S_H\implies (A+BK)s\in\mathcal S_H$$ which also guarantees safety: $$\mathcal S_H\subseteq \mathcal S_\mathrm{safe},\quad Ks\in\mathcal A_\mathrm{safe}\quad\forall~~s\in\mathcal S_H$$
  • ex: sublevel set of \(s^\top P s\)

Constrained LQR Problem

$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t \\ G_s s_t\leq b_s,G_a a_t\leq b_a$$

MPC Policy

$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k\\G_s x_k\leq b_s,G_a u_k\leq b_a ,x_H\in\mathcal S_H$$

Terminal cost and constraints for LQR

This satisfies the assumptions:

  • The origin is uncontrolled fixed pointed \(F(0,0)=0\)
    • ✓ \(A\cdot 0+B\cdot 0=0\)
  • The costs are positive definite and constraints contain \(0\)
    • ✓ \(Q,R,P\) are psd, assume \(b_s,b_a\geq 0\)
  • The terminal set is contained in safe set and is control invariant
    • ✓ by construction, have \(u=Ks\) guarantees invariance
  • The terminal cost satisfies \(c_H(s_{t+1}) - c_H(s_t) \leq -c(s_t, u) \) for some \(u\) such that \(s_{t+1} = F(s_{t}, u)\in\mathcal S_H\)
    • Exercise: use the form of \(K\) and the DARE to show that $$((A+BK)s)^\top P(A+BK)s - s^\top Ps = -s^\top Q s$$
  • Recall the Bellman Optimality Equation:
    • \( \pi^\star(s) = \arg\min_{a\in\mathcal A} c(s, a)+J^\star (F(s,a))\)
  • For LQR, this means that
    • \(\pi^\star(s) = \arg\min_a s^\top Q s + a^\top R a + (As+Ba)^\top P (As+Ba)\)
    • \(\pi^\star(s) = \arg\min_u x_0^\top Q x_0 + u^\top R u + x_1^\top P x_1~~\text{s.t.} ~~x_0=s,~~x_1=Ax_0+Bu\)
  • This is MPC with \(H=1\) and correct terminal cost!

Equivalence for unconstrained

Constrained LQR Problem

$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t $$

MPC Policy

$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k$$

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},~  u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)

  • Terminal constraint not often used (instead: long horizon)
  • Soft constraints
    • \(x_k+\delta \in\mathcal S_\mathrm{safe}\) and add penalty \(C\|\delta\|_2^2\) to cost
  • Accuracy of costs/dynamics vs. ease of optimization
  • Sampling based optimization (cross entropy method)

MPC in practice

\(F\)

\(s\)

\(s_t\)

\(a_t = u_0^\star(s_t)\)

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},~  u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)

  • Disturbances:
    • optimize expectation, high probability, or worst-case
  • Unknown dynamics/costs
    • robust to uncertainty (worst case)
    • learn from data

MPC extensions

\(F\)

\(s\)

\(s_t\)

\(a_t = u_0^\star(s_t)\)

Recap

  • Recap: MPC
  • Feasibility problems
  • Terminal sets and costs
  • Proof of feasibility and stability

References: Predictive Control by Borrelli, Bemporad, Morari

Reminders

  • Project update due Friday
  • Upcoming paper presentations:
    • [RB17] Learning model predictive control for iterative tasks

    • [DSA+20] Fairness is not static

    • [FLD21] Algorithmic fairness and the situated dynamics of justice

22 - Model Predictive Control - ML in Feedback Sys

By Sarah Dean

Private

22 - Model Predictive Control - ML in Feedback Sys