Prof Sarah Dean
Symposium on Socially responsible Automation hosted by NCCR Automation at EPFL
policy
\(\pi_t:\mathcal S\to\mathcal A\)
observation
\(s_t\)
accumulate
\(\{(s_t, a_t, c_t)\}\)
Goal: select actions \(a_t\) to bring environment to low-cost states
action
\(a_{t}\)
\(s\)
Stochastic Infinite Horizon Optimal Control Problem
$$ \min_{\pi} ~~\lim_{t\to\infty} \mathbb E_w\Big[\frac{1}{T}\sum_{k=0}^{T} c(s_k, \pi(s_k)) \Big]\quad \text{s.t}\quad s_0~~\text{given},~~s_{k+1} = F(s_k, \pi(s_k),w_k) $$
\(\underbrace{\qquad\qquad}_{J^\pi(s_0)}\)
Bellman Optimality Equation
\( \underset{\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}Q^{1/2} &\\& R^{1/2}\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\ \mathbf{\Phi}_a \end{bmatrix} \right\|_{\mathcal H_2}^2~~\text{s.t.}~~ \begin{bmatrix} zI - A & - B\end{bmatrix} \begin{bmatrix} \mathbf{\Phi}_s \\ \mathbf{\Phi}_a \end{bmatrix}= I \)
Setting: have data \(\{s_k, a_k, c_k\}_{k=0}^N\). Approaches include a focus on:
\(J(\hat{\mathbf \Phi}) - J(\mathbf \Phi_\star) \leq \epsilon\) for \(N\gtrsim \frac{(m+n)^2}{\epsilon^2}\)
Approximate Policy Iteration [KTR19]
Policy Gradient [FGKM18]
How to learn when data gradually reacts to your model [IZY22]
Is low cost all we want?
A trajectory of states \((s_0,\dots,s_t)\) is safe if \(\mathcal s_k\in\mathcal S_\mathrm{safe}\) for all \(0\leq k\leq t\).
We define safety in terms of the "safe set" \(\mathcal S_\mathrm{safe}\subseteq \mathcal S\).
(we can analogously define \(\mathcal A_\mathrm{safe}\subseteq \mathcal A\) and require that \(\mathcal a_k\in\mathcal A_\mathrm{safe}\) for all \(0\leq k\leq t\))
A state \(s\) is safe if \(\mathcal s\in\mathcal S_\mathrm{safe}\).
The state is position & velocity \(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t \)
Safety constraint on position \(|\theta|\leq 1\)
Are trajectories safe as long as \(|\theta_0|<1\)?
We define safety in terms of the "safe set" \(\mathcal S_\mathrm{safe}\subseteq \mathcal S\)
A system \(s_{t+1}=F(s_t)\) is safe if some \(\mathcal S_\mathrm{inv}\subseteq \mathcal S_{\mathrm{safe}}\) is invariant, i.e.
Exercise: Prove that if \(\mathcal S_\mathrm{inv}\) is invariant for dynamics \(F\), then \(s_0\in \mathcal S_\mathrm{inv} \implies s_t\in\mathcal S_\mathrm{inv}\) for all \(t\).
\((As)^\top \sum_{t=0}^\infty (A^t)^\top A^t (As) \)
\(= s^\top \sum_{t=1}^\infty (A^t)^\top A^t s \)
\(\leq s^\top \sum_{t=0}^\infty (A^t)^\top A^t s \leq c\)
Example: An invariant set for
\(s=[\theta,\omega]\) with \( s_{t+1} = \begin{bmatrix} 0.9 & 0.1\\ & 0.9 \end{bmatrix}s_t \)
Claim: if \(V(s)\) is a Lyapunov function for \(F\) then any sublevel set \(\{V(s)\leq c\}\) is invariant.
Definition: A Lyapunov function \(V:\mathcal S\to \mathbb R\) for \(F\) is continuous and
\(a_t = {\color{Goldenrod} K_t }s_{t}\)
\( \underset{\mathbf a }{\min}\) \(\displaystyle\sum_{t=0}^T s_t^\top Q s_t + a_t^\top R a_t\)
\(\text{s.t.}~~s_{t+1} = As_t + Ba_t \)
\(s_t \in\mathcal S_\mathrm{safe},~~ a_t \in\mathcal A_\mathrm{safe}\)
\(\begin{bmatrix} \mathbf s\\ \mathbf a\end{bmatrix} = \begin{bmatrix} \mathbf \Phi_s\\ \mathbf \Phi_a\end{bmatrix}\mathbf w \)
\(\mathbf w = \begin{bmatrix}s_0\\ 0\\ \vdots \\0 \end{bmatrix}\)
\( \underset{\color{teal}\mathbf{\Phi}}{\min}\)\(\left\| \begin{bmatrix}\bar Q^{1/2} &\\& \bar R^{1/2}\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix} \mathbf w\right\|_{2}^2\)
\(\text{s.t.}~~ \begin{bmatrix} I - \mathcal Z \bar A & - \mathcal Z \bar B\end{bmatrix} \begin{bmatrix}\color{teal} \mathbf{\Phi}_s \\ \color{teal} \mathbf{\Phi}_a\end{bmatrix}= I \)
\(\mathbf \Phi_s\mathbf w \in\mathcal S_\mathrm{safe}^T,~~\mathbf \Phi_a\mathbf w\in\mathcal A_\mathrm{safe}^T\)
Phi_s = cvx.Variable((T*n, T*n), name="Phi_s")
Phi_a = cvx.Variable((T*p, T*n), name="Phi_a")
# Affine dynamics constraint
constr = [Phi_s[:n, :] == np.eye(n)]
for k in range(T-1):
constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:])
constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0)
# Polytope safety constraint
# # F_s s_k <= b_x and F_a a_k <= b_a
for k in range(T-1):
constr.append(F_s @ Phi_s[n*(k+1):n*(k+1),:] @ s_0 <= b_s)
constr.append(F_a @ Phi_a[n*(k+1):n*(k+1),:] @ s_0 <= b_a)
# Quadratic cost
cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)]
+ [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)])
objective = cvx.norm(cost_matrix,'fro')
prob = cvx.Problem(cvx.Minimize(objective), constr)
prob.solve()
Phi_s = np.array(Phi_s.value)
Phi_a = np.array(Phi_a.value)
size of \(a\)
size of \(s\)
safety constraint
Claim: Suppose that for all \(t\), the policy satisfies
$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad C(F(s_t, a)) \leq \gamma C(s_t) $$
\(C(F(s, a))-C(s) \leq -(1-\gamma) C(s) \)
size of \(s\)
size of \(a\)
safety constraint
\(C(s)=0\)
Example: safety filter for linear dynamics
$$a_t = \arg\min_{a\in\mathcal A_\mathrm{safe} } \|a-Ks_t\|_2 \quad \text{s.t.}\quad C(As_t+Ba_t) \leq \gamma C(s_t) $$
Claim: Suppose that for all \(t\), the policy satisfies
$$\pi(s_t) = \text{find}\quad a\quad\text{s.t.}\quad C(F(s_t, a)) \leq \gamma C(s_t) $$
Exercise: If \(C\) is a quadratic function, when is the above optimization problem feasible for some \(a\in\mathbb R^m\)?
Adversarial perspective is common when dealing with disturbances.
References: Predictive Control by Borrelli, Bemporad, Morari