Model Predictive Control

ML in Feedback Sys #21

Prof Sarah Dean

Reminders/etc

  • Project midterm update due 11/11
  • Scribing feedback to come by next week!
  • Upcoming paper presentations starting next week
  • Participation includes attending presentations

policy

\(\pi_t:\mathcal S\to\mathcal A\)

observation

\(s_t\)

accumulate

\(\{(s_t, a_t, c_t)\}\)

Safe action in a dynamic world

Goal: select actions \(a_t\) to bring environment to low-cost states
while avoiding unsafe states

action

\(a_{t}\)

\(F\)

\(s\)

Motivation for constraints

Recap: Invariant Sets

  • A set \(\mathcal S_\mathrm{inv}\) is invariant under dynamics \(s_{t+1} = F(s_t)\) if for all \( s\in\mathcal S_\mathrm{inv}\), \( F(s)\in\mathcal S_\mathrm{inv}\)
  • If \(\mathcal S_\mathrm{inv}\) is invariant for dynamics \(F\), then \(s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}\) for all \(t\).
  • Example: sublevel set of Lyapunov function
    • \(\{s\mid V(s)\leq c\}\)

Recap: Safety

  • We define safety in terms of the "safe set" \(\mathcal S_\mathrm{safe}\subseteq \mathcal S\).
  • A state \(s\) is safe if \(\mathcal s\in\mathcal S_\mathrm{safe}\).

  • A trajectory of states \((s_0,\dots,s_t)\) is safe if \(\mathcal s_k\in\mathcal S_\mathrm{safe}\) for all \(0\leq k\leq t\).

  • A system \(s_{t+1}=F(s_t)\) is safe if some \(\mathcal S_\mathrm{inv}\subseteq \mathcal S_{\mathrm{safe}}\) is invariant and \(s_0\in \mathcal S_{\mathrm{safe}}\).

Recap: Constrained Control

       \(a_t = {\color{Goldenrod} K_t }s_{t}\)

\( \underset{\mathbf a }{\min}\)   \(\displaystyle\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\)

\(\text{s.t.}~~s_{t+1} = As_t + Ba_t \)

       \(s_t \in\mathcal S_\mathrm{safe},~~ a_t \in\mathcal A_\mathrm{safe}\)

\(\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w \)

\(\mathbf w = \begin{bmatrix}s_0\\ 0\\ \vdots \\0 \end{bmatrix}\)

  • open loop problem is convex if costs convex, dynamics linear, and \(\mathcal S_\mathrm{safe}\) and \(\mathcal A_\mathrm{safe}\) are convex
  • closed-loop linear problem can be reformulated as convex using System Level Synthesis

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix} \mathbf w\right\|_{2}^2\)

\(\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix}= I \)

       \(\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T,~~\mathbf \Phi_a\mathbf w\in\mathcal A_\mathrm{safe}^T\)

Recap: Control Barrier Function

Claim: Suppose that for all \(t\), the policy satisfies

  • \(C(F(s_t, \pi(s_t)))\leq \gamma C(s_t)\) for some \(0\leq \gamma\leq 1\).
  • Then \(\{s\mid C(s)\leq 0\}\) is an invariant set.

$$\pi(s_t) = \arg\min_a \|a - \pi^\star_\mathrm{unc}(s_t)\|_2^2 \quad\text{s.t.}\quad  C(F(s_t, a)) \leq   \gamma C(s_t) $$

size of \(s\)

size of \(a\)

safety constraint

\(C(s)=0\)

Today: Receding Horizon Control

Instead of optimizing for open loop control...

\( \underset{a_0,\dots,a_T }{\min}\)   \(\displaystyle\sum_{t=0}^T c(s_t, a_t)\)

\(\text{s.t.}~~s_0~\text{given},~~s_{t+1} = F(s_t, a_t)\)

       \(s_t \in\mathcal S_\mathrm{safe},~~ a_t \in\mathcal A_\mathrm{safe}\)

  • Observe state
  • Optimize plan
  • For \(t=0,1,...\)
    1. Apply planned action
  • For \(t=0,1,\dots\)
    1. Observe state
    2. Optimize plan
    3. Apply first planned action

...re-optimize to close the loop

model predicts the trajectory during planning

Also called Model Predictive Control

Figure from slides by Borelli, Jones, Morari

Plan:

  •  minimize lap time
  • subject to
    • car dynamics
    • staying within lane
    • bounded acceleration

Example: racing

  • For \(t=0,1,\dots\)
    1. Observe state
    2. Optimize plan
    3. Apply first planned action

Receding Horizon Leads to Feedback

time

Do

Plan

Do

Plan

Do

Plan

  • For \(t=0,1,\dots\)
    1. Observe state
    2. Optimize plan
    3. Apply first planned action

\(a_t\)

\(F\)

\(s\)

\(s_t\)

\(a_t\)

\( \underset{a_0,\dots,a_H }{\min}\)   \(\displaystyle\sum_{k=0}^H c(s_k, a_k)\)

\(\text{s.t.}~~s_0~\text{given},~~s_{k+1} = F(s_k, a_k)\)

       \(s_k \in\mathcal S_\mathrm{safe},~~ a_k \in\mathcal A_\mathrm{safe}\)

We can:

  • replan if state \(s_{t+1}\) does not evolve as expected (e.g. due to disturbances)
  • easily incorporate time-varying dynamics, constraints (e.g. obstables), costs (e.g. new incentives)
  • plan over a shorter horizon than we operate (e.g. \(H\leq T\))
    • important for realtime computation!

Benefits of Replanning

\(\pi(s_t) = u_0^\star(s_t)\)

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H} c(x_{k}, u_{k})$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe}\quad~~~\)

Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).

\([u_0^\star,\dots, u_{H}^\star](s_t) = \arg\)

The MPC Policy

$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H} c(x_{k}, u_{k})$$

\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)

\(x_k\in\mathcal S_\mathrm{safe},\quad  u_k\in\mathcal A_\mathrm{safe}\quad~~~\)

Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).

The MPC Policy

\(F\)

\(s\)

\(s_t\)

\(a_t = u_0^\star(s_t)\)

Example: receding horizon LQR

Infinite Horizon LQR Problem

$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t$$

We know that \(a^\star_t = \pi^\star(s_t)\) where \(\pi^\star(s) = K s\) and

  • \(K = -(R+B^\top PB)^{-1}B^\top QPA\)
  • \(P=\mathrm{DARE}(A,B,Q,R)\)

Finite LQR Problem

$$ \min ~~\sum_{k=0}^{H} x_k^\top Qx_k + u_k^\top Ru_k \quad \\ \text{s.t}\quad x_0=s,\quad x_{k+1} = A x_k+ Bu_k $$

MPC Policy \(a_t = u^\star_0(s_t)\) where
\(u^\star_0(s) = K_0s\) and

  • \(K_0 = -(R+B^\top P_{1}B)^{-1}B^\top QP_{1}A\)
  • \(P_1=\) backwards DP iteration (\(H\) step)
  • \(P = Q+A^\top PA + A^\top PB(R+B^\top PB)^{-1}B^\top PA\)
  • \(P_t = Q+A^\top P_{t+1}A + A^\top P_{t+1}B(R+B^\top P_{t+1}B)^{-1}B^\top P_{t+1}A\)

Example

The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\  1 \end{bmatrix}a_t\)

Goal: stay near origin and be energy efficient

  • \(c(s,a) = s^\top \begin{bmatrix} 1 & \\ & 0.5 \end{bmatrix}s + a^2 \)
  • Safety constraint \(|\theta|\leq 1\) and actuation limit \(|a|\leq 0.5\)
  • Linear Program (LP): Linear costs and constraints, feasible set is polyhedron. $$\min_z c^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$
  • Quadratic Program (QP): Quadratic cost and linear constraints, feasible set is polyhedron. Convex if \(P\succeq 0\). $$\min_z z^\top P z+q^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$

Some types of optimization problems

Figures from slides by Goulart, Borelli

  • Linear Program (LP): Linear costs and constraints, feasible set is polyhedron. $$\min_z c^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$
  • Quadratic Program (QP): Quadratic cost and linear constraints, feasible set is polyhedron. Convex if \(P\succeq 0\). $$\min_z z^\top P z+q^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$
    • Nonconvex if \(P\nsucceq 0\).

Some types of optimization problems

Figures from slides by Goulart, Borelli

  • Linear Program (LP): Linear costs and constraints, feasible set is polyhedron. $$\min_z c^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$
  • Quadratic Program (QP): Quadratic cost and linear constraints, feasible set is polyhedron. Convex if \(P\succeq 0\). $$\min_z z^\top P z+q^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b$$
    • Nonconvex if \(P\nsucceq 0\).
  • Mixed Integer Linear Program (MILP): LP with discrete constraints. $$\min_z c^\top z~~ \text{s.t.} ~~Gz\leq h,~~Az=b,~~z\in\{0,1\}^n$$

Some types of optimization problems

Figures from slides by Goulart, Borelli

  • Pros
    • Flexibility: any model and any objective
    • Guarantees safety by definition*
      • *over planning horizon
  • Cons
    • Computationally demanding
    • How to guarantee long term stability/feasibility?

Pros and Cons of MPC

Example: infeasibility

The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\  1 \end{bmatrix}a_t\)

Goal: stay near origin and be energy efficient

  • Safety constraint \(|\theta|\leq 1\) and actuation limit \(|a|\leq 0.5\)

Recap

  • Recap of safety
    • invariance, constraints, barrier functions
  • Receding Horizon Control
  • MPC Policy and Optimization
  • Feasibility Problem

References: Predictive Control by Borrelli, Bemporad, Morari

21 - Model Predictive Control - ML in Feedback Sys

By Sarah Dean

Private

21 - Model Predictive Control - ML in Feedback Sys