Sarah Dean PRO
asst prof in CS at Cornell
Prof Sarah Dean
Symposium on Socially responsible Automation hosted by NCCR Automation at EPFL
policy
πt:S→A
observation
st
accumulate
{(st,at,ct)}
Goal: select actions at to bring environment to low-cost states
action
at
s
Stochastic Infinite Horizon Optimal Control Problem
πmin t→∞lim Ew[T1k=0∑Tc(sk,π(sk))]s.ts0 given, sk+1=F(sk,π(sk),wk)
Jπ(s0)
Bellman Optimality Equation
Φmin[Q1/2R1/2][Φs Φa]H22 s.t. [zI− A−B][Φs Φa]=I
Setting: have data {sk,ak,ck}k=0N. Approaches include a focus on:
J(Φ^)− J(Φ⋆)≤ϵ for N≳ϵ2(m+n)2
Approximate Policy Iteration [KTR19]
Policy Gradient [FGKM18]
How to learn when data gradually reacts to your model [IZY22]
Is low cost all we want?
A trajectory of states (s0,…,st) is safe if sk∈Ssafe for all 0≤k≤t.
We define safety in terms of the "safe set" Ssafe⊆S.
(we can analogously define Asafe⊆A and require that ak∈Asafe for all 0≤k≤t)
A state s is safe if s∈Ssafe.
The state is position & velocity s=[θ,ω] with st+1=[0.90.10.9]st
Safety constraint on position ∣θ∣≤1
Are trajectories safe as long as ∣θ0∣<1?
We define safety in terms of the "safe set" Ssafe⊆S
A system st+1=F(st) is safe if some Sinv⊆Ssafe is invariant, i.e.
Exercise: Prove that if Sinv is invariant for dynamics F, then s0∈Sinv⟹st∈Sinv for all t.
(As)⊤∑t=0∞(At)⊤At(As)
=s⊤∑t=1∞(At)⊤Ats
≤ s⊤∑t=0∞(At)⊤Ats≤c
Example: An invariant set for
s=[θ,ω] with st+1=[0.90.10.9]st
Claim: if V(s) is a Lyapunov function for F then any sublevel set {V(s)≤c} is invariant.
Definition: A Lyapunov function V:S→R for F is continuous and
at=Ktst
amin t=0∑Tst⊤Qst+at⊤Rat
s.t. st+1=Ast+Bat
st∈Ssafe, at∈Asafe
[sa]=[ΦsΦa]w
w=s00⋮0
Φmin[Qˉ1/2Rˉ1/2][ΦsΦa]w22
s.t. [I−ZAˉ−ZBˉ][ΦsΦa]=I
Φsw∈SsafeT, Φaw∈AsafeT
Phi_s = cvx.Variable((T*n, T*n), name="Phi_s") Phi_a = cvx.Variable((T*p, T*n), name="Phi_a") # Affine dynamics constraint constr = [Phi_s[:n, :] == np.eye(n)] for k in range(T-1): constr.append(Phi_s[n*(k+1):n*(k+1+1),:] == A*Phi_s[n*k:n*(k+1),:] + B*Phi_a[p*k:p*(k+1),:]) constr.append(A*Phi_s[n*(T-1):,:] + B*Phi_a[p*(T-1):,:] == 0) # Polytope safety constraint # # F_s s_k <= b_x and F_a a_k <= b_a for k in range(T-1): constr.append(F_s @ Phi_s[n*(k+1):n*(k+1),:] @ s_0 <= b_s) constr.append(F_a @ Phi_a[n*(k+1):n*(k+1),:] @ s_0 <= b_a) # Quadratic cost cost_matrix = cvx.bmat([[Q_sqrt*Phi_s[n*k:n*(k+1), :]] for k in range(T)] + [[R_sqrt*Phi_a[p*k:p*(k+1), :]] for k in range(T)]) objective = cvx.norm(cost_matrix,'fro') prob = cvx.Problem(cvx.Minimize(objective), constr) prob.solve() Phi_s = np.array(Phi_s.value) Phi_a = np.array(Phi_a.value)
size of a
size of s
safety constraint
Claim: Suppose that for all t, the policy satisfies
π(st)=findas.t. C(F(st,a))≤ γC(st)
C(F(s,a))−C(s)≤ −(1−γ)C(s)
size of s
size of a
safety constraint
C(s)=0
Example: safety filter for linear dynamics
at=arga∈Asafemin∥a−Kst∥2s.t. C(Ast+Bat)≤ γC(st)
Claim: Suppose that for all t, the policy satisfies
π(st)=findas.t. C(F(st,a))≤ γC(st)
Exercise: If C is a quadratic function, when is the above optimization problem feasible for some a∈Rm?
Adversarial perspective is common when dealing with disturbances.
References: Predictive Control by Borrelli, Bemporad, Morari
By Sarah Dean