Sarah Dean PRO
asst prof in CS at Cornell
Sarah Dean
ACC Workshop, May 2023
joint work with Raunak Kumar and Bobby Kleinberg
Online interaction
Offline (hindsight) control problem
πmint=1∑Tct(st,at)s.t.st+1=f(st,at,wt), at=π(st)
loss depends on all past actions
decision variable is a function
Online Convex Optimization (OCO) with Unbounded Memory is a framework which directly addresses these challenges
Components of OCO with memory problem
Online interaction protocol
ft
xt
ht
Online interaction protocol
ft
xt
Example: loss depends arbitrarily on m past decisions (Anava et al., 2015)
ht
Online interaction protocol
ft
xt
Example: loss depends on all past decisions with ρ-discount factor
ht
The regret of an algorithm whose decisions result in h1,…,hT is RT(A)=t=1∑Tft(ht)−x∈Xmint=1∑Tf~t(x)ft(k=1∑tAkBx)
ft
xt
Goal: perform well compared to the best fixed decision in hindsight
ht
Assumptions
Definition (p-effective memory capacity): Hp=(k=0∑∞kp∥Ak∥p)1/p
amint=1∑Tct(st,a)s.t.st+1=Fst+Ga+wt
ft
ht
xt
Algorithm: Follow-the-Regularized-Leader on f~t
lazy
^
if tmodL~LHp=0, otherwise xt+1=xt
Algorithm: Follow-the-Regularized-Leader on f~t
Proof Sketch. Decompose regret into two terms RT(A)=∑t=1Tft(ht)−∑t=1Tf~t(xt)+∑t=1Tf~t(xt)−minx∈X∑t=1Tf~t(x)
The following instance of OCO with finite memory has RT(A)≥Ω(THpLL~)=Ω(Tm)∀ A
Let X=[−1,1], finite memory H=Xm, Rademacher samples w1,…wmT, and ft(ht)=w⌈mt⌉m−1/2(xt−m+1+⋯+xm⌊mt⌋+1)
m steps in the past
time of sample
ht and ft
wi
wi−1
wi+1
m
t
Kmint=1∑Tct(st,at)s.t.st+1=Fst+Ga+wt, at=Kst
instead of a loop,
system looks like a line
(F,G)
K
s
a
w
s
a
w
X
H
Kmint=1∑Tct(st,at)s.t.st+1=Fst+Ga+wt, at=Kst
s
a
w
Xt
H
Kmint=1∑Tct(st,at)s.t.st+1=Fst+Ga+wt, at=Kst
Online Convex Optimization with Unbounded Memory
https://arxiv.org/abs/2210.09903
Raunak Kumar Sarah Dean Robert Kleinberg
References:
By Sarah Dean