Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
policy
\(\pi_t:\mathcal S\to\mathcal A\)
observation
\(s_t\)
accumulate
\(\{(s_t, a_t, c_t)\}\)
Goal: select actions \(a_t\) to bring environment to low-cost states
while avoiding unsafe states
action
\(a_{t}\)
\(s\)
time
Do
Plan
Do
Plan
Do
Plan
\(\pi(s_t) = u_0^\star(s_t)\)
$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~\)
Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).
\([u_0^\star,\dots, u_{H-1}^\star](s_t) = \arg\)
$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe}\quad~~~\)
Notation: distinguish real states and actions \(s_t\) and \(a_t\) from the planned optimization variables \(x_k\) and \(u_k\).
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 1 & 0.1\\ & 1 \end{bmatrix}s_t + \begin{bmatrix} 0\\ 1 \end{bmatrix}a_t\)
Goal: stay near origin and be energy efficient
$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}\)
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) $$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}\)
Recursive feasibility: feasible at \(s_t\implies\) feasible at \(s_{t+1}\)
Proof:
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k})\qquad\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})$$
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H=0}\)
Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and
Theorem (1.2, 1.4): Suppose that \(F\) is locally Lipschitz, \(s_{eq}=0\) is a fixed point, and \(V\) is a Lyapunov function for \(F,s_{eq}\). Then, \(s_{eq}=0\) is
Proof:
\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.
$$\min_{u_0,\dots, u_{H}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) +\textcolor{cyan}{ c_H(x_H)}$$
\(\text{s.t.}\quad x_0 = s,\quad x_{k+1} = F(x_{k}, u_{k})\qquad\qquad\)
\(x_k\in\mathcal S_\mathrm{safe},\quad u_k\in\mathcal A_\mathrm{safe},\quad \textcolor{cyan}{x_H\in\mathcal S_H}\)
Assumptions:
Recursive feasibility: feasible at \(s_t\implies\) feasible at \(s_{t+1}\)
Proof:
Proof:
\(J^\star(s)\) is positive definite and strictly decreasing. Therefore, the closed loop dynamics \(F(\cdot, \pi_\mathrm{MPC}(\cdot))\) are asymptotically stable.
Based on unconstrained LQR policy where \(P=\mathrm{DARE}(A,B,Q,R)\) $$ K=-(B^\top PB+R)^{-1}B^\top P$$
Constrained LQR Problem
$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t \\ G_s s_t\leq b_s,G_a a_t\leq b_a$$
MPC Policy
$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k\\G_s x_k\leq b_s,G_a u_k\leq b_a ,x_H\in\mathcal S_H$$
This satisfies the assumptions:
This is MPC with \(H=1\) and correct terminal cost!
Constrained LQR Problem
$$ \min ~~\lim_{T\to\infty}\frac{1}{T}\sum_{t=0}^{T} s_t^\top Qs_t+ a_t^\top Ra_t\quad\\ \text{s.t}\quad s_{t+1} = A s_t+ Ba_t $$
MPC Policy
$$ \min ~~\sum_{k=0}^{H-1} x_k^\top Qx_k + u_k^\top Ru_k +x_H^\top Px_H\quad \\ \text{s.t}\quad x_0=s,~~ x_{k+1} = A x_k+ Bu_k$$
$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
$$\min_{u_0,\dots, u_{H-1}} \quad\sum_{k=0}^{H-1} c(x_{k}, u_{k}) + c_H(x_H)$$
\(\text{s.t.}\quad x_0 = s_t,\quad x_{k+1} = F(x_{k}, u_{k})\)
\(x_k\in\mathcal S_\mathrm{safe},~ u_k\in\mathcal A_\mathrm{safe},~x_H\in\mathcal S_H\)
\(s\)
\(s_t\)
\(a_t = u_0^\star(s_t)\)
References: Predictive Control by Borrelli, Bemporad, Morari
[RB17] Learning model predictive control for iterative tasks
[DSA+20] Fairness is not static
[FLD21] Algorithmic fairness and the situated dynamics of justice
By Sarah Dean