Safe Control

ML in Feedback Sys #20

Prof Sarah Dean

Reminders/etc

Comments on proposals posted
- Project midterm update due 11/11
Scribing feedback to come soon!
Upcoming paper presentations starting next week
Participation includes attending presentations

Reporting back on NCCR Symposium

Symposium on Socially responsible Automation hosted by NCCR Automation at EPFL

talks on responsibility/fairness in cyberphysical systems (power grid, dynamic pricing), recommendation systems, repeated resource allocation via "karma economy", and vulnerability-aware AV rules
interesting idea: "reparative fairness"
- $\mathbb E[$future cost$\mid$past cost$]\propto -$past cost

policy

$\pi_t:\mathcal S\to\mathcal A$

observation

$s_t$

accumulate

$\{(s_t, a_t, c_t)\}$

Action in a dynamic world

Goal: select actions $a_t$ to bring environment to low-cost states

action

$a_{t}$

$F$

$s$

Recap: Optimal Control

Stochastic Infinite Horizon Optimal Control Problem

$$ \min_{\pi} ~~\lim_{t\to\infty} \mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi(s_k),w_k) $$

$\underbrace{\qquad\qquad}_{J^\pi(s_0)}$

Bellman Optimality Equation

$\underbrace{J^\star (s)}_{\text{value function}} = \min_{a\in\mathcal A} \underbrace{c(s, a)+\mathbb E_w[J^\star (F(s,a,w))]}_{\text{state-action function}}$
Minimizing argument is $\pi^\star(s)$

Recap: LQR with known dynamics

Goal: minimize quadratic cost ($Q,R$) in a system with linear dynamics ($A,B$)
Classic approach: Dynamic programming/Bellman optimality
- $P = \mathrm{DARE}(A,B,Q,R)$ and $K_\star = -(R+B^\top PB)^{-1}B^\top QPA$
System level synthesis: Convex optimization
- $ \underset{\mathbf{\Phi}}{\min}$$\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\ \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2~~\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\ \mathbf{\Phi}_a \end{bmatrix}= I $

Setting: have data $\{s_k, a_k, c_k\}_{k=0}^N$. Approaches include a focus on:

Model: learn dynamics (and costs) from data, then do policy design
- For LQR: estimate $\hat A,\hat B$ ($\hat Q,\hat R$) then design $\hat K$
- "model based"
Bellman: learn value or state-action function
- For LQR: estimate $\hat J$ then determine $\hat K$ as $\argmin$
- "model free"
Policy: estimate gradients and update policy directly
- For LQR: $\hat K \leftarrow \hat K -\alpha\widehat{\nabla J}(\hat K)$
- "model free"

Recap: unknown dynamics

Learn Model:
- estimate $\hat A,\hat B$ via least-squares, guarantee $\max\{\varepsilon_A, \varepsilon_B\}\lesssim \sqrt{\frac{m+n}{N}}$
Design Policy:
- robust approach uses $\hat A, \hat B, \varepsilon_A, \varepsilon_B$
  - $J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \leq \epsilon$ for $N\gtrsim \frac{(m+n)^2}{\epsilon^2}$
- nominal or certainty equivalent approach uses $\hat A, \hat B$
  - for small enough $\varepsilon$, can show that $J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \lesssim \varepsilon^2$
  - thus faster rate, $J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \leq \epsilon$ for $N\gtrsim \frac{m+n}{\epsilon}$

Model-based LQR

Approximate Policy Iteration [KTR19]

estimate quadratic state-action function with LSTD $$ q^\top f + q^\top \phi(s_t, a_t) = c(s_t,a_t)+\mathbb E_{w_t}[q^\top \phi(s_{t+1}, \pi(s_{t+1})]$$
update policy as $$ \hat Ks = \arg\min_a \hat q^\top \phi(s, a) = \arg\min_a \begin{bmatrix}s\\ a\end{bmatrix}^\top \mathrm{mat}(q) \begin{bmatrix}s\\ a\end{bmatrix}$$
guarantee $\|\hat K - K_\star \|_2 \leq \epsilon$ for $NT \gtrsim \frac{(m+n)^3}{\epsilon^2}$

Model-free LQR

Policy Gradient [FGKM18]

estimate $\widehat{\nabla J}(K)$ with finite differencing
update policy as $$ K \leftarrow K -\alpha\widehat{\nabla J}( K)$$
guarantee $J(\hat K) - J(K_\star ) \leq\epsilon$ for $N\gtrsim poly(n,m,1/\epsilon)$

Dynamic Performative Optimality

How to learn when data gradually reacts to your model [IZY22]

Learner chooses $\theta_t$
Population reacts as $\rho_{t} = \mathcal D(\theta_t, \rho_{t-1})$
For fixed $\theta$, the limiting distribution $\rho_\star(\theta) = \lim_{t\to\infty} \rho_t$
Goal for learner: minimize $\mathcal L^\star(\theta) = \mathbb E_{z\sim\rho_\star(\theta)} [\ell(z, \theta)]$
- Similar to goal of minimizing $\lim_{T\to\infty} \frac{1}{T} \sum_{t=0}^T \mathbb E_{z\sim\rho_t} [\ell(z, \theta)]$
Approach: sample-based gradient descent.
- estimate parameters of $\rho_t$ and parametric model of $\partial_t \mathcal D$, use to estimate gradient of $\mathcal L^\star$

Is low cost all we want?

In the setting of performative prediction, we may learn that homogeneous populations are easier to make predictions about
- recall retention dynamics of Hashimoto et al
Controlling autonomous vehicles or robots may involve avoiding obstacles or staying on the road

Motivation: Safety

A trajectory of states $(s_0,\dots,s_t)$ is safe if $\mathcal s_k\in\mathcal S_\mathrm{safe}$ for all $0\leq k\leq t$.

Safe Trajectories

We define safety in terms of the "safe set" $\mathcal S_\mathrm{safe}\subseteq \mathcal S$.

(we can analogously define $\mathcal A_\mathrm{safe}\subseteq \mathcal A$ and require that $\mathcal a_k\in\mathcal A_\mathrm{safe}$ for all $0\leq k\leq t$)

A state $s$ is safe if $\mathcal s\in\mathcal S_\mathrm{safe}$.

Example

The state is position & velocity $s=[\theta,\omega]$ with $ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t $

Safety constraint on position $|\theta|\leq 1$

Are trajectories safe as long as $|\theta_0|<1$?

no! Exercise: what is the necessary condition on $\omega_0$?

Safe Invariant Sets

We define safety in terms of the "safe set" $\mathcal S_\mathrm{safe}\subseteq \mathcal S$

A system $s_{t+1}=F(s_t)$ is safe if some $\mathcal S_\mathrm{inv}\subseteq \mathcal S_{\mathrm{safe}}$ is invariant, i.e.

for all $ s\in\mathcal S_\mathrm{inv}$, $ F(s)\in\mathcal S_\mathrm{inv}$

Exercise: Prove that if $\mathcal S_\mathrm{inv}$ is invariant for dynamics $F$, then $s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}$ for all $t$.

Example: Linear Dynamics

Consider stable linear dynamics $s_{t+1}=As_t$.
Consider the set $ \{s\mid s^\top Ps \leq c\} $ with $P = \sum_{t=0}^\infty (A^t)^\top A^t$
Claim: This is an invariant set
- $(As)^\top \sum_{t=0}^\infty (A^t)^\top A^t (As) $
- $= s^\top \sum_{t=1}^\infty (A^t)^\top A^t s $
- $\leq s^\top \sum_{t=0}^\infty (A^t)^\top A^t s \leq c$

Example: An invariant set for
$s=[\theta,\omega]$ with $ s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t $

Claim: if $V(s)$ is a Lyapunov function for $F$ then any sublevel set $\{V(s)\leq c\}$ is invariant.

Invariance via Lyapunov

$V(F(s))$
- $\leq V(s)$
- $\leq c$

Definition: A Lyapunov function $V:\mathcal S\to \mathbb R$ for $F$ is continuous and

(positive definite) $V(0)=0$ and $V(0)>0$ for all $s\in\mathcal S - \{0\}$
(decreasing) $V(F(s)) - V(s) \leq 0$ for all $s\in\mathcal S$

Constrained Control

$a_t = {\color{Goldenrod} K_t }s_{t}$

$ \underset{\mathbf a }{\min}$ $\displaystyle\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t$

$\text{s.t.}~~s_{t+1} = As_t + Ba_t $

$s_t \in\mathcal S_\mathrm{safe},~~ a_t \in\mathcal A_\mathrm{safe}$

$\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w $

$\mathbf w = \begin{bmatrix}s_0\\ 0\\ \vdots \\0 \end{bmatrix}$

nonconvex in $K$

convex if $\mathcal S_\mathrm{safe}$ and $\mathcal A_\mathrm{safe}$ are convex

$ \underset{\color{teal}\mathbf{\Phi}}{\min}$$\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix} \mathbf w\right\|_{2}^2$

$\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix}= I $

$\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T,~~\mathbf \Phi_a\mathbf w\in\mathcal A_\mathrm{safe}^T$

Phi_s = cvx.Variable((T*n, T*n), name="Phi_s")
Phi_a = cvx.Variable((T*p, T*n), name="Phi_a")

# Affine dynamics constraint
constr = [Phi_s[:n, :] == np.eye(n)]
for k in range(T-1):
    constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:])
constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0)   

# Polytope safety constraint
# # F_s s_k <= b_x and F_a a_k <= b_a
for k in range(T-1):
    constr.append(F_s @ Phi_s[n*(k+1):n*(k+1),:] @ s_0 <= b_s)
    constr.append(F_a @ Phi_a[n*(k+1):n*(k+1),:] @ s_0 <= b_a)

# Quadratic cost
cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)] 
                       + [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)])
objective = cvx.norm(cost_matrix,'fro')

prob = cvx.Problem(cvx.Minimize(objective), constr)
prob.solve()     
Phi_s = np.array(Phi_s.value)
Phi_a = np.array(Phi_a.value)

Safe policies via convex programming

Linear control means policy has a constant gain
Constant gain may be an inefficient way to ensure safety

Nonlinear Safe Control

size of $a$

size of $s$

safety constraint

Control Barrier Function

Claim: Suppose that for all $t$, the policy satisfies

$C(F(s_t, \pi(s_t)))\leq \gamma C(s_t)$ for some $0\leq \gamma\leq 1$.
Then $\{s\mid C(s)\leq 0\}$ is an invariant set.

$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad C(F(s_t, a)) \leq \gamma C(s_t) $$

$C(F(s, a))-C(s) \leq -(1-\gamma) C(s) $

size of $s$

size of $a$

safety constraint

$C(s)=0$

Example: safety filter for linear dynamics

Control Barrier Function

$$a_t = \arg\min_{a\in\mathcal A_\mathrm{safe} } \|a-Ks_t\|_2 \quad \text{s.t.}\quad C(As_t+Ba_t) \leq \gamma C(s_t) $$

Claim: Suppose that for all $t$, the policy satisfies

$C(F(s_t, \pi(s_t)))\leq \gamma C(s_t)$ for some $0\leq \gamma\leq 1$.
Then $\{s\mid C(s)\leq 0\}$ is an invariant set.

$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad C(F(s_t, a)) \leq \gamma C(s_t) $$

Exercise: If $C$ is a quadratic function, when is the above optimization problem feasible for some $a\in\mathbb R^m$?

Adversarial perspective is common when dealing with disturbances.

$\mathcal S_\mathrm{inv}$ is robustly invariant if for all $ s\in\mathcal S_\mathrm{inv}$ and $w\in\mathcal W$, $ F(s, w)\in\mathcal S_\mathrm{inv}$
- ex: $F(s,w) = \gamma s+w$ and $|w|\leq B$ then $|s|\leq B/(1-\gamma)$ is invariant.
Robust constraints: $$\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T\quad\text{for all}\quad w\in\mathcal W$$
Robust safety filter: $$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad C(F(s_t, a, w)) \leq \gamma C(s_t)~~\forall ~~ w\in\mathcal W $$

Safety with Disturbances

Recap

Recap of data-driven optimal control (RL)
- policy, value, model
Safety as constraints/invariance
Safe control with
- quadratic Lyapunov functions
- System level synthesis
- Barrier functions

References: Predictive Control by Borrelli, Bemporad, Morari

20 - Safe Control - ML in Feedback Sys

By Sarah Dean

20 - Safe Control - ML in Feedback Sys

Sarah Dean PRO

asst prof in CS at Cornell

sdean.website

Safe Control

ML in Feedback Sys #20

Reminders/etc

Reporting back on NCCR Symposium

Action in a dynamic world

\(F\)

Recap: Optimal Control

Recap: LQR with known dynamics

Recap: unknown dynamics

Model-based LQR

Model-free LQR

Dynamic Performative Optimality

Motivation: Safety

Safe Trajectories

Example

Safe Invariant Sets

Example: Linear Dynamics

Invariance via Lyapunov

Constrained Control

Safe policies via convex programming

Nonlinear Safe Control

Control Barrier Function

Control Barrier Function

Safety with Disturbances

Recap

20 - Safe Control - ML in Feedback Sys

More from Sarah Dean