22 - Model Predictive Control - ML in Feedback Sys

Model Predictive Control

ML in Feedback Sys #21

Prof Sarah Dean

Reminders/etc

Project midterm update due Friday!
Scribing feedback to come by next week
Upcoming paper presentations starting next week
- RB17 Frank, Wei-Han, Minjae
- DSA+20 Jerry, Wendy, Yueying
Participation includes attending presentations
Relevant AI Seminar tomorrow 12:15-1:15 in Gates 122: Brandon Amos, Learning with differentiable and amortized optimization

policy

$\pi_t:\mathcal S\to\mathcal A$

observation

$s_t$

accumulate

$\{(s_t, a_t, c_t)\}$

Safe action in a dynamic world

Goal: select actions $a_t$ to bring environment to low-cost states
while avoiding unsafe states

action

$a_{t}$

$F$

$s$

Recap: Invariant Sets

A set $\mathcal S_\mathrm{inv}$ is invariant under dynamics $s_{t+1} = F(s_t)$ if for all $ s\in\mathcal S_\mathrm{inv}$, $ F(s)\in\mathcal S_\mathrm{inv}$
If $\mathcal S_\mathrm{inv}$ is invariant for dynamics $F$, then $s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}$ for all $t$.
Example: sublevel set of Lyapunov function
- $\{s\mid V(s)\leq c\}$

Recap: Receding Horizon

time

Plan

For $t=0,1,\dots$
1. Observe state
2. Optimize plan
3. Apply first planned action

$\pi(s_t) = u_0^\star(s_t)$

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$

$\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~$

Notation: distinguish real states and actions $s_t$ and $a_t$ from the planned optimization variables $x_k$ and $u_k$.

$[u_0^\star,\dots, u_{H-1}^\star](s_t) = \arg$

Recap: The MPC Policy

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$

$\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~$

Notation: distinguish real states and actions $s_t$ and $a_t$ from the planned optimization variables $x_k$ and $u_k$.

Recap: The MPC Policy

$F$

$s$

$s_t$

$a_t = u_0^\star(s_t)$

Example: infeasibility

The state is position & velocity $s=[\theta,\omega]$ with $ s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\ 1 \end{bmatrix}a_t$

Goal: stay near origin and be energy efficient

Safety constraint $|\theta|\leq 1$ and actuation limit $|a|\leq 0.5$

Infeasibility = inability to guarantee safety
- also leads to loss of stability
States that are initially feasible vs. states that remain feasible
- not different when plan is over $H=T$

Infeasibility Problem

infeasible
initially feasible
remain feasible

Infeasibility = inability to guarantee safety
- also leads to loss of stability
States that are initially feasible vs. states that remain feasible
- not different when plan is over ($H=T$)
Definition: We call $\mathcal S_\mathrm{inv}$ a control invariant set for dynamics
$s_{t+1} = F(s_t, a_t)$ if for all $ s\in\mathcal S_\mathrm{inv}$, there exists an $a\in\mathcal A_\mathrm{safe}$ such that $ F(s, a)\in\mathcal S_\mathrm{inv}$
Definition: The region of attraction for dynamics $\tilde F(s)$ is the set of initial states $s_0$ that converge to the origin.
- We consider the closed loop dynamics $\tilde F(s) = F(s,\pi(s))$

Infeasibility Problem

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$

$\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}$

Let $J(s; u_0,\dots, u_{H-1})$ be the value of the objective for actions $u_0,\dots, u_{H-1}$
Let $J^\star(s)=J(s; u^\star_0,\dots, u^\star_{H-1})$ be the optimal value.
Assume that stage cost $c(s,a)$ is positive definite, i.e. $c(s,a)>0$ for all $s,a\neq 0$ and $c(0,0)=0$.
Assume that $0\in\mathcal S_\mathrm{safe}$ and $0\in\mathcal A_\mathrm{safe}$

Terminal cost and constraints

Receding horizon control is short sighted
Additional terms more closely approximate infinite horizon problem

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) $$

$\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}$

Assume that the origin is a fixed point of the uncontrolled dynamics
- i.e. $F(0,0)=0$
We will prove that
- $\pi_\mathrm{MPC}$ is recursively feasible
- $F(s, \pi_\mathrm{MPC}(s))$ is stable within region of attraction

Terminal cost and constraints

Warm up: consider $\mathcal S_H = \{0\}$ and $c_H(0)=0$

Recursive feasibility: feasible at $s_t\implies$ feasible at $s_{t+1}$

Proof:

$s_t$ feasible and solution to optimization problem is $u^\star_{0}, \dots, u^\star_{H-1}$ with corresponding states $x^\star_{0}, \dots, x^\star_{H}$
After applying $a_t=u^\star_{0}$, state moves to $s_{t+1} = F(s_t,a_t)$
- Notice that $s_{t+1} = x^\star_1$
Claim: $u^\star_{1}, \dots, u^\star_{H-1}, 0$ is now a feasible solution.
- because $x_{H}^\star=0$ and $F(0,0)=0$
- thus corresponding states $x^\star_{1}, \dots, x^\star_{H}, 0$ satisfy constraints

Recursive Feasibility

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})\qquad\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}$

Review: Lyapunov Stability

Definition: A Lyapunov function $V:\mathcal S\to \mathbb R$ for $F$ is continuous and

(positive definite) $V(0)=0$ and $V(0)>0$ for all $s\in\mathcal S - \{0\}$
(decreasing) $V(F(s)) - V(s) \leq 0$ for all $s\in\mathcal S$
Optionally, (strict) $V(F(s)) - V(s) < 0$ for all $s\in\mathcal S-\{0\}$

Theorem (1.2, 1.4): Suppose that $F$ is locally Lipschitz, $s_{eq}=0$ is a fixed point, and $V$ is a Lyapunov function for $F,s_{eq}$. Then, $s_{eq}=0$ is

asymptotically stable if $V$ is strictly decreasing

Proof:

$J^\star(s)$ is positive definite and strictly decreasing. Therefore, the closed loop dynamics $F(\cdot, \pi_\mathrm{MPC}(\cdot))$ are asymptotically stable.

Positive definite:
- if $s=0$, then optimal actions are $0$ since $F(0,0)=0$ and stage cost is positive definite
- if $s\neq 0$, $J^\star(s)>0$ since stage cost is positive definite

Stability

Strictly decreasing: recall $J^\star (s_t) =\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)$
- $J^\star(s_{t+1}) \leq$ cost of feasible solution starting at $x_0=F(s_t, u^\star_{0})$
  - $=J(s_{t+1}; u^\star_1,\dots, u^\star_{H-1}, 0)=\sum_{k=1}^{H-1} c(x^\star_{k}, u^\star_{k}) + c(x^\star_{H}, 0)+c_H(0)$
  - $=\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k}) + \cancel{c(x^\star_{H}, 0)}+\cancel{c_H(0)} -c(x^\star_{0}, u^\star_{0}) $
  - $= J^\star (s) -c(x^\star_{0}, u^\star_{0}) < J^\star (s)$

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$

$\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad$

$x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}$

General terminal cost and constraints

Assumptions:

The origin is uncontrolled fixed pointed $F(0,0)=0$
The costs are positive definite and constraints contain $0$
The terminal set is contained in safe set and is control invariant
The terminal cost satisfies $$ c_H(s_{t+1}) - c_H(s_t) \leq -c(s_t, u) $$ for some $u$ such that $s_{t+1} = F(s_{t}, u)\in\mathcal S_H$

Recursive feasibility: feasible at $s_t\implies$ feasible at $s_{t+1}$

Proof:

$s_t$ feasible and solution to optimization problem is $u^\star_{0}, \dots, u^\star_{H-1}$ with corresponding states $x^\star_{0}, \dots, x^\star_{H}$
After applying $a_t=u^\star_{0}$, state moves to $s_{t+1} = F(s_t,a_t)$
- Notice that $s_{t+1} = x^\star_1$
Claim: there exists a $u$ such that $u^\star_{1}, \dots, u^\star_{H-1}, u$ is a feasible solution.
- because $x_{H}^\star\in\mathcal S_H$ and $\mathcal S_H$ is control invariant
- thus corresponding states $x^\star_{1}, \dots, x^\star_{H}, F(x^\star_{H}, u)$ satisfy constraints

Recursive Feasibility

Proof:

$J^\star(s)$ is positive definite and strictly decreasing. Therefore, the closed loop dynamics $F(\cdot, \pi_\mathrm{MPC}(\cdot))$ are asymptotically stable.

Stability

Positive definite: same argument as before
Strictly decreasing: recall $J^\star (s_t) =\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k}) +c_H(x_H^\star)$
- $J^\star(s_{t+1}) \leq$ cost of feasible solution starting at $x_0=F(s_t, u^\star_{0})$
  - $=J(s_{t+1}; u^\star_1,\dots, u^\star_{H-1}, u)$
  - $=\sum_{k=1}^{H-1} c(x^\star_{k}, u^\star_{k}) + c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))$
  - $=\sum_{k=0}^{H-1} c(x^\star_{k}, u^\star_{k})+c_H(x^\star_H)+ c(x^\star_{H}, u)+c_H(F(x^\star_{H}, u))-c_H(x^\star_H) -c(x^\star_{0}, u^\star_{0}) $
  - $\leq J^\star (s) +c(x^\star_{H}, u) - c(x^\star_{H}, u) -c(x^\star_{0}, u^\star_{0}) < J^\star (s)$

Terminal cost and constraints for LQR

Based on unconstrained LQR policy where $P=\mathrm{DARE}(A,B,Q,R)$ $$ K=-(B^\top PB+R)^{-1}B^\top P$$

Terminal cost as $c_H(s) = s^\top P s$
Terminal set is any invariant set for closed loop $$s\in\mathcal S_H\implies (A+BK)s\in\mathcal S_H$$ which also guarantees safety: $$\mathcal S_H\subseteq \mathcal S_\mathrm{safe},\quad Ks\in\mathcal A_\mathrm{safe}\quad\forall~~s\in\mathcal S_H$$
ex: sublevel set of $s^\top P s$

Constrained LQR Problem

$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t \\ G_s s_t\leq b_s,G_a a_t\leq b_a$$

MPC Policy

$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k\\G_s x_k\leq b_s,G_a u_k\leq b_a ,x_H\in\mathcal S_H$$

Terminal cost and constraints for LQR

This satisfies the assumptions:

The origin is uncontrolled fixed pointed $F(0,0)=0$
- ✓ $A\cdot 0+B\cdot 0=0$
The costs are positive definite and constraints contain $0$
- ✓ $Q,R,P$ are psd, assume $b_s,b_a\geq 0$
The terminal set is contained in safe set and is control invariant
- ✓ by construction, have $u=Ks$ guarantees invariance
The terminal cost satisfies $c_H(s_{t+1}) - c_H(s_t) \leq -c(s_t, u) $ for some $u$ such that $s_{t+1} = F(s_{t}, u)\in\mathcal S_H$
- Exercise: use the form of $K$ and the DARE to show that $$((A+BK)s)^\top P(A+BK)s - s^\top Ps = -s^\top Q s$$

Recall the Bellman Optimality Equation:
- $ \pi^\star(s) = \arg\min_{a\in\mathcal A} c(s, a)+J^\star (F(s,a))$
For LQR, this means that
- $\pi^\star(s) = \arg\min_a s^\top Q s + a^\top R a + (As+Ba)^\top P (As+Ba)$
- $\pi^\star(s) = \arg\min_u x_0^\top Q x_0 + u^\top R u + x_1^\top P x_1~~\text{s.t.} ~~x_0=s,~~x_1=Ax_0+Bu$
This is MPC with $H=1$ and correct terminal cost!

Equivalence for unconstrained

Constrained LQR Problem

$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t $$

MPC Policy

$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k$$

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$

$\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$

$x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H$

Terminal constraint not often used (instead: long horizon)
Soft constraints
- $x_k+\delta \in\mathcal S_\mathrm{safe}$ and add penalty $C\|\delta\|_2^2$ to cost
Accuracy of costs/dynamics vs. ease of optimization
Sampling based optimization (cross entropy method)

MPC in practice

$F$

$s$

$s_t$

$a_t = u_0^\star(s_t)$

$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$

$\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$

$x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H$

Disturbances:
- optimize expectation, high probability, or worst-case
Unknown dynamics/costs
- robust to uncertainty (worst case)
- learn from data

MPC extensions

$F$

$s$

$s_t$

$a_t = u_0^\star(s_t)$

Recap

Recap: MPC
Feasibility problems
Terminal sets and costs
Proof of feasibility and stability

References: Predictive Control by Borrelli, Bemporad, Morari

Reminders

Project update due Friday
Upcoming paper presentations:
- [RB17] Learning model predictive control for iterative tasks
- [DSA+20] Fairness is not static
- [FLD21] Algorithmic fairness and the situated dynamics of justice

Model Predictive Control

ML in Feedback Sys #21

Reminders/etc

Safe action in a dynamic world

\(F\)

Recap: Invariant Sets

Recap: Receding Horizon

Recap: The MPC Policy

Recap: The MPC Policy

\(F\)

Example: infeasibility

Infeasibility Problem

Infeasibility Problem

Terminal cost and constraints

Terminal cost and constraints

Recursive Feasibility

Review: Lyapunov Stability

Stability

General terminal cost and constraints

Recursive Feasibility

Stability

Terminal cost and constraints for LQR

Terminal cost and constraints for LQR

Equivalence for unconstrained

MPC in practice

\(F\)

MPC extensions

\(F\)

Recap

Reminders