Safe Control

ML in Feedback Sys #20

Prof Sarah Dean


  • Comments on proposals posted
    • Project midterm update due 11/11
  • Scribing feedback to come soon!
  • Upcoming paper presentations starting next week
  • Participation includes attending presentations

Reporting back on NCCR Symposium

Symposium on Socially responsible Automation hosted by NCCR Automation at EPFL

  • talks on responsibility/fairness in cyberphysical systems (power grid, dynamic pricing), recommendation systems, repeated resource allocation via "karma economy", and vulnerability-aware AV rules
  • interesting idea: "reparative fairness"
    • \(\mathbb E[\)future cost\(\mid\)past cost\(]\propto -\)past cost


\(\pi_t:\mathcal S\to\mathcal A\)




\(\{(s_t, a_t, c_t)\}\)

Action in a dynamic world

Goal: select actions \(a_t\) to bring environment to low-cost states





Recap: Optimal Control

Stochastic Infinite Horizon Optimal Control Problem

$$ \min_{\pi} ~~\lim_{t\to\infty} \mathbb  E_w\Big[\frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi(s_k),w_k) $$


Bellman Optimality Equation

  • \(\underbrace{J^\star (s)}_{\text{value function}} = \min_{a\in\mathcal A} \underbrace{c(s, a)+\mathbb E_w[J^\star (F(s,a,w))]}_{\text{state-action function}}\)
  • Minimizing argument is \(\pi^\star(s)\)

Recap: LQR with known dynamics

  • Goal: minimize quadratic cost (\(Q,R\)) in a system with linear dynamics (\(A,B\))
  • Classic approach: Dynamic programming/Bellman optimality
    • \(P = \mathrm{DARE}(A,B,Q,R)\) and \(K_\star = -(R+B^\top PB)^{-1}B^\top QPA\)
  • System level synthesis: Convex optimization
    • \( \underset{\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\  \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2~~\text{s.t.}~~ \begin{bmatrix} zI -  A & - B\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\  \mathbf{\Phi}_a \end{bmatrix}= I \)

Setting: have data \(\{s_k, a_k, c_k\}_{k=0}^N\). Approaches include a focus on:

  1. Model: learn dynamics (and costs) from data, then do policy design
    • For LQR: estimate \(\hat A,\hat B\) (\(\hat Q,\hat R\)) then design \(\hat K\)
    • "model based"
  2. Bellman: learn value or state-action function
    • For LQR: estimate \(\hat J\) then determine \(\hat K\) as \(\argmin\)
    • "model free"
  3. Policy: estimate gradients and update policy directly
    • For LQR: \(\hat K \leftarrow \hat K -\alpha\widehat{\nabla J}(\hat K)\)
    • "model free"

Recap: unknown dynamics

  1. Learn Model:
    • estimate \(\hat A,\hat B\) via least-squares, guarantee \(\max\{\varepsilon_A, \varepsilon_B\}\lesssim \sqrt{\frac{m+n}{N}}\)
  2. Design Policy:
    • robust approach uses \(\hat A, \hat B, \varepsilon_A, \varepsilon_B\)
      • \(J(\hat{\mathbf \Phi}) -  J(\mathbf \Phi_\star) \leq \epsilon\) for \(N\gtrsim \frac{(m+n)^2}{\epsilon^2}\)

    • nominal or certainty equivalent approach uses \(\hat A, \hat B\)
      • for small enough \(\varepsilon\), can show that \(J(\hat{\mathbf \Phi}) -  J(\mathbf \Phi_\star) \lesssim \varepsilon^2\)
      • thus faster rate, \(J(\hat{\mathbf \Phi}) -  J(\mathbf \Phi_\star) \leq \epsilon\) for \(N\gtrsim \frac{m+n}{\epsilon}\)

Model-based LQR

Approximate Policy Iteration [KTR19]

  • estimate quadratic state-action function with LSTD $$ q^\top f + q^\top \phi(s_t, a_t) = c(s_t,a_t)+\mathbb E_{w_t}[q^\top \phi(s_{t+1}, \pi(s_{t+1})]$$
  • update policy as $$ \hat Ks = \arg\min_a \hat q^\top \phi(s, a) = \arg\min_a \begin{bmatrix}s\\ a\end{bmatrix}^\top \mathrm{mat}(q) \begin{bmatrix}s\\ a\end{bmatrix}$$
  • guarantee \(\|\hat K - K_\star \|_2 \leq \epsilon\) for \(NT \gtrsim \frac{(m+n)^3}{\epsilon^2}\)

Model-free LQR

Policy Gradient [FGKM18]

  • estimate \(\widehat{\nabla J}(K)\) with finite differencing
  • update policy as $$  K \leftarrow  K -\alpha\widehat{\nabla J}( K)$$
  • guarantee \(J(\hat K) - J(K_\star ) \leq\epsilon\) for \(N\gtrsim poly(n,m,1/\epsilon)\)

Dynamic Performative Optimality

How to learn when data gradually reacts to your model [IZY22]

  • Learner chooses \(\theta_t\)
  • Population reacts as \(\rho_{t} = \mathcal D(\theta_t, \rho_{t-1})\)
  • For fixed \(\theta\), the limiting distribution \(\rho_\star(\theta) = \lim_{t\to\infty} \rho_t\)
  • Goal for learner: minimize \(\mathcal L^\star(\theta) = \mathbb E_{z\sim\rho_\star(\theta)} [\ell(z, \theta)]\)
    • Similar to goal of minimizing \(\lim_{T\to\infty} \frac{1}{T} \sum_{t=0}^T \mathbb E_{z\sim\rho_t} [\ell(z, \theta)]\)
  • Approach: sample-based gradient descent.
    • estimate parameters of \(\rho_t\) and parametric model of \(\partial_t \mathcal D\), use to estimate gradient of \(\mathcal L^\star\)

Is low cost all we want?

  • In the setting of performative prediction, we may learn that homogeneous populations are easier to make predictions about
    • recall retention dynamics of Hashimoto et al
  • Controlling autonomous vehicles or robots may involve avoiding obstacles or staying on the road

Motivation: Safety

A trajectory of states \((s_0,\dots,s_t)\) is safe if \(\mathcal s_k\in\mathcal S_\mathrm{safe}\) for all \(0\leq k\leq t\).

Safe Trajectories

We define safety in terms of the "safe set" \(\mathcal S_\mathrm{safe}\subseteq \mathcal S\).

(we can analogously define \(\mathcal A_\mathrm{safe}\subseteq \mathcal A\) and require that \(\mathcal a_k\in\mathcal A_\mathrm{safe}\) for all \(0\leq k\leq t\))

A state \(s\) is safe if \(\mathcal s\in\mathcal S_\mathrm{safe}\).


The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t \)

Safety constraint on position \(|\theta|\leq 1\)

Are trajectories safe as long as \(|\theta_0|<1\)?

  • no! Exercise: what is the necessary condition on \(\omega_0\)?

Safe Invariant Sets

We define safety in terms of the "safe set" \(\mathcal S_\mathrm{safe}\subseteq \mathcal S\)

A system \(s_{t+1}=F(s_t)\) is safe if some \(\mathcal S_\mathrm{inv}\subseteq \mathcal S_{\mathrm{safe}}\) is invariant, i.e.

  • for all \( s\in\mathcal S_\mathrm{inv}\), \( F(s)\in\mathcal S_\mathrm{inv}\)

Exercise: Prove that if \(\mathcal S_\mathrm{inv}\) is invariant for dynamics \(F\), then \(s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}\) for all \(t\).

Example: Linear Dynamics

  • Consider stable linear dynamics \(s_{t+1}=As_t\).
  • Consider the set \( \{s\mid s^\top Ps \leq c\} \) with \(P = \sum_{t=0}^\infty (A^t)^\top A^t\)
  • Claim: This is an invariant set
    • \((As)^\top \sum_{t=0}^\infty (A^t)^\top A^t (As) \)

    • \(= s^\top \sum_{t=1}^\infty (A^t)^\top A^t s \)

    • \(\leq  s^\top \sum_{t=0}^\infty (A^t)^\top A^t s \leq c\)

Example: An invariant set for
\(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t \)

Claim: if \(V(s)\) is a Lyapunov function for \(F\) then any sublevel set \(\{V(s)\leq c\}\) is invariant.

Invariance via Lyapunov

  • \(V(F(s))\)
    • \(\leq V(s)\)
    • \(\leq c\)

Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and

  • (positive definite) \(V(0)=0\) and \(V(0)>0\) for all \(s\in\mathcal S - \{0\}\)
  • (decreasing) \(V(F(s)) - V(s) \leq 0\) for all \(s\in\mathcal S\)

Constrained Control

       \(a_t = {\color{Goldenrod} K_t }s_{t}\)

\( \underset{\mathbf a }{\min}\)   \(\displaystyle\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\)

\(\text{s.t.}~~s_{t+1} = As_t + Ba_t \)

       \(s_t \in\mathcal S_\mathrm{safe},~~ a_t \in\mathcal A_\mathrm{safe}\)

\(\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w \)

\(\mathbf w = \begin{bmatrix}s_0\\ 0\\ \vdots \\0 \end{bmatrix}\)

  • nonconvex in \(K\)
  • convex if \(\mathcal S_\mathrm{safe}\) and \(\mathcal A_\mathrm{safe}\) are convex

\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix} \mathbf w\right\|_{2}^2\)

\(\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix}= I \)

       \(\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T,~~\mathbf \Phi_a\mathbf w\in\mathcal A_\mathrm{safe}^T\)

Phi_s = cvx.Variable((T*n, T*n), name="Phi_s")
Phi_a = cvx.Variable((T*p, T*n), name="Phi_a")

# Affine dynamics constraint
constr = [Phi_s[:n, :] == np.eye(n)]
for k in range(T-1):
    constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:])
constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0)   

# Polytope safety constraint
# # F_s s_k <= b_x and F_a a_k <= b_a
for k in range(T-1):
    constr.append(F_s @ Phi_s[n*(k+1):n*(k+1),:] @ s_0 <= b_s)
    constr.append(F_a @ Phi_a[n*(k+1):n*(k+1),:] @ s_0 <= b_a)

# Quadratic cost
cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)] 
                       + [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)])
objective = cvx.norm(cost_matrix,'fro')

prob = cvx.Problem(cvx.Minimize(objective), constr)
Phi_s = np.array(Phi_s.value)
Phi_a = np.array(Phi_a.value)

Safe policies via convex programming

  • Linear control means policy has a constant gain
  • Constant gain may be an inefficient way to ensure safety

Nonlinear Safe Control

size of \(a\)

size of \(s\)

safety constraint

Control Barrier Function

Claim: Suppose that for all \(t\), the policy satisfies

  • \(C(F(s_t, \pi(s_t)))\leq \gamma C(s_t)\) for some \(0\leq \gamma\leq 1\).
  • Then \(\{s\mid C(s)\leq 0\}\) is an invariant set.

$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad  C(F(s_t, a)) \leq   \gamma C(s_t) $$

\(C(F(s, a))-C(s) \leq   -(1-\gamma) C(s) \)

size of \(s\)

size of \(a\)

safety constraint


Example: safety filter for linear dynamics

Control Barrier Function

$$a_t = \arg\min_{a\in\mathcal A_\mathrm{safe} } \|a-Ks_t\|_2 \quad \text{s.t.}\quad  C(As_t+Ba_t) \leq  \gamma C(s_t) $$

Claim: Suppose that for all \(t\), the policy satisfies

  • \(C(F(s_t, \pi(s_t)))\leq \gamma C(s_t)\) for some \(0\leq \gamma\leq 1\).
  • Then \(\{s\mid C(s)\leq 0\}\) is an invariant set.

$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad  C(F(s_t, a)) \leq   \gamma C(s_t) $$

Exercise: If \(C\) is a quadratic function, when is the above optimization problem feasible for some \(a\in\mathbb R^m\)?

Adversarial perspective is common when dealing with disturbances.

  • \(\mathcal S_\mathrm{inv}\) is robustly invariant if for all \( s\in\mathcal S_\mathrm{inv}\) and \(w\in\mathcal W\), \( F(s, w)\in\mathcal S_\mathrm{inv}\)
    • ex: \(F(s,w) = \gamma s+w\) and \(|w|\leq B\) then \(|s|\leq B/(1-\gamma)\) is invariant.
  • Robust constraints: $$\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T\quad\text{for all}\quad w\in\mathcal W$$
  • Robust safety filter: $$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad  C(F(s_t, a, w)) \leq   \gamma C(s_t)~~\forall ~~ w\in\mathcal W $$

Safety with Disturbances


  • Recap of data-driven optimal control (RL)
    • policy, value, model
  • Safety as constraints/invariance
  • Safe control with
    • quadratic Lyapunov functions
    • System level synthesis
    • Barrier functions

References: Predictive Control by Borrelli, Bemporad, Morari

20 - Safe Control - ML in Feedback Sys

By Sarah Dean


20 - Safe Control - ML in Feedback Sys