Online Convex Optimization With Unbounded Memory
Sarah Dean
ACC Workshop, May 2023
joint work with Raunak Kumar and Bobby Kleinberg
Online interaction
 Choose an action \(a_t\) according to policy \(\pi_t\)
 State updates according to \(f\) and \(w_t\)
 Pay a cost \(c_t(s_t,a_t)\)
Motivation: Online Optimal Control
Offline (hindsight) control problem
$$ \min_{\pi} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1}=f(s_t,a_t,w_t),~~a_t=\pi(s_t) $$
loss depends on all past actions
decision variable is a function
Outline
Online Convex Optimization (OCO) with Unbounded Memory is a framework which directly addresses these challenges
 Problem Setting & Examples
 Main Results: Regret Bounds
 Applications & Conclusion
Components of OCO with memory problem
 Decision space \(\mathcal X\) is closed, convex subset of Hilbert space
 History space \(\mathcal H\) is Banach space
 Linear operators \(A:\mathcal H\to\mathcal H\) and \(B:\mathcal X\to \mathcal H\)
 Convex loss functions \(f_t:\mathcal H\to\mathbb R\)
Problem Setting
Online interaction protocol
 In round \(t=1,2,...,T\):
 Learner chooses decision \(x_t\in\mathcal X\)
 History updates as \(h_t = Ah_{t1}+Bx_t\)
 Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)
\(f_t\)
\(x_t\)
\(h_t\)
Problem Setting
Online interaction protocol
 In round \(t=1,2,...,T\):
 Learner chooses decision \(x_t\in\mathcal X\)
 History updates as \(h_t = Ah_{t1}+Bx_t\)
 Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)
\(f_t\)
\(x_t\)
Example: loss depends arbitrarily on \(m\) past decisions (Anava et al., 2015)
 History space \(\mathcal H = \mathcal X\times\dots\times\mathcal X = \mathcal X^m\)
 Linear operators \(A=\begin{bmatrix} & I \\ && \ddots \\ &&& I \\ &\end{bmatrix},\quad B = \begin{bmatrix} I \\ 0 \\ \vdots \end{bmatrix}\)
\(h_t\)
Problem Setting
Online interaction protocol
 In round \(t=1,2,...,T\):
 Learner chooses decision \(x_t\in\mathcal X\)
 History updates as \(h_t = Ah_{t1}+Bx_t\)
 Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)
\(f_t\)
\(x_t\)
Example: loss depends on all past decisions with \(\rho\)discount factor
 History space \(\mathcal H\) contains \(T\) length sequences over \(\mathcal X\)
 Linear operators $$A(x_0, x_1, \dots)=(0, \rho x_0, \rho x_1, \dots ),\quad B x = (x,0,\dots )$$
\(h_t\)
The regret of an algorithm whose decisions result in \(h_1,\dots,h_T\) is $$ R_T(\mathcal A) = \sum_{t=1}^T f_t(h_t)  \min_{x\in\mathcal X} \sum_{t=1}^T \underbrace{ f_t\left(\sum_{k=1}^t A^k B x\right)}_{\tilde f_t(x)}$$
Regret Minimization
\(f_t\)
\(x_t\)
Goal: perform well compared to the best fixed decision in hindsight
\(h_t\)
Assumptions
 Learner observes the function \(f_t\) after each round, knows \(A\) and \(B\), and \(\B\=1\)
 Functions \(f_t\) are differentiable, \(L\)Lipschitz continuous, and convex
 implies that \(\tilde f_t = f_t\circ \sum_{k=1}^t A^k B \) are diff'ble, convex, Lipschitz with \(\tilde L \leq L\sum_{k=0}^\infty \A^k \\)
Assumptions & Definitions
Definition (\(p\)effective memory capacity): \(\displaystyle H_p = \left( \sum_{k=0}^\infty k^p \A^k\^p \right)^{1/p}\)
 Bounds distance in history resulting from decisions whose distance grows at most linearly with time
$$\min_{a} \sum_{t=1}^T c_t(s_t,a) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t $$
 History combines "noiseless" state with action $$\bar s_{t+1} = F\bar s_t + G a_t,\quad h_t = \begin{bmatrix} \bar s_t \\a_t \end{bmatrix}$$
 Linear operators defined by dynamics $$A=\begin{bmatrix} F\\ & \end{bmatrix},\quad B=\begin{bmatrix}G\\ I\end{bmatrix}$$
 Loss functions defined by cost & disturbance $$f_t(h_t) = c_t\left (\bar s_t + \sum_{k=1}^t F^k G w_{tk}, a_t\right )=c_t(s_t, a_t)$$
Example: constant input linear control

Theorem: there are algorithms such that the regret of an OCO with unbounded memory problem is at most $$O\left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) $$
 effective memory capacity & Lipschitz constants
 Theorem: there exists an OCO with unbounded memory problem with regret at least $$\Omega \left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) $$
Main Results
\(f_t\)
\(h_t\)
\(x_t\)
 total loss from playing \(x\) every round
 strongly convex regularizer
 step size \(\eta = (T\tilde L(LH_p+\tilde L))^{1/2} \)
Upper Bound
Algorithm: FollowtheRegularizedLeader on \(\tilde f_t\)
 For \(t=1,\dots,T\) $$x_{t+1} = \min_{x\in\mathcal X} \sum_{k=1}^t \tilde f_k(x) + \frac{R(x)}{\eta} $$
lazy
^
if \(t \mod \frac{LH_p}{\tilde L}=0\), otherwise \(x_{t+1}=x_t\)
Upper Bound
Algorithm: FollowtheRegularizedLeader on \(\tilde f_t\)
 For \(t=1,\dots,T\) $$x_{t+1} = \min_{x\in\mathcal X} \sum_{k=1}^t \tilde f_k(x) + \frac{R(x)}{\eta} $$
Proof Sketch. Decompose regret into two terms $$R_T(\mathcal A) = \textstyle \sum_{t=1}^T f_t(h_t)  \sum_{t=1}^T \tilde f_t(x_t) + \sum_{t=1}^T \tilde f_t(x_t)  \min_{x\in\mathcal X} \sum_{t=1}^T \tilde f_t(x)$$
 Standard OCO with FTRL: \(\eta^{1} + \eta T\tilde L^2\)
 Actual vs. idealized history: \(\eta T L \tilde L H_p\)
The following instance of OCO with finite memory has $$R_T(\mathcal A) \geq \Omega \left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) = \Omega \left(\sqrt{T} m\right)\quad \forall~~\mathcal A$$
Lower Bound
Let \(\mathcal X = [1,1]\), finite memory \(\mathcal H = \mathcal X^m\), Rademacher samples \(w_1,\dots w_{\frac{T}{m}}\), and $$ f_t(h_t) = w_{\lceil\frac{t}{m}\rceil} m^{1/2} (x_{tm+1} + \dots + x_{m\lfloor\frac{t}{m}\rfloor + 1})$$
\(m\) steps in the past
time of sample
\(h_t\) and \(f_t\)
\(w_i\)
\(w_{i1}\)
\(w_{i+1}\)
\(\underbrace{\qquad\qquad}\)
\(m\)
\(t\)
Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
 Convex lifting to disturbanceaction controllers (Youla et al., 1976; Anderson et al., 2019; Agarwal et al., 2019) $$ \textstyle a_t = \sum_{k=1}^{t+1}M^{[k]} w_{tk},\quad X_t = (M^{[k]})_{k\in[t]}$$
 History contains (weighted) sequences of controllers $$H_t = (X_t, GX_{t1}, FGX_{t2},F^2 GX_{t3},\dots )$$
instead of a loop,
system looks like a line
\((F,G)\)
\(K\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X\)
\(H\)
Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
 Convex lifting to disturbanceaction controllers (Youla et al., 1976; Anderson et al., 2019; Agarwal et al., 2019) $$ \textstyle a_t = \sum_{k=1}^{t+1}M^{[k]} w_{tk},\quad X_t = (M^{[k]})_{k\in[t]}$$
 History contains (weighted) sequences of controllers $$H_t = (X_t, GX_{t1}, FGX_{t2},F^2 GX_{t3},\dots )$$
 Linear operators defined by linear dynamics $$A\left((Y_0,Y_1,\dots)\right) = (0, GY_0, F Y_1,\dots),\quad BX=(X,0,.\dots)$$
 States & actions are linear in history & decisions, so loss functions are defined by cost & disturbance $$f_t(H_t) = c_t\big ( \underbrace{\langle H_t, w_{1:t} \rangle}_{(s_t,a_t)}\big)=c_t(s_t, a_t)$$
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X_t\)
\(H\)
Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
 Approach: translate usual control assumptions on dynamics, controllers, and costs into the quantities
\(H_2\), \(L\), and \(\tilde L\)  No truncation analysis is necessary
 Improve existing upper bounds (Agarwal et al., 2019) by a factor of dimension and stability radius
Conclusion & Discussion
 Key takeaways for OCO with Unbounded Memory
 General framework which captures online control
 Matching upper & (worst case) lower regret bounds highlight fundamental quantity: effective memory capacity
 Open directions for future work
 Unknown dynamics \(A\) and \(B\)
 "Bandit" feedback of \(f_t(h_t)\)
 Nonlinearly evolving history
 Nonconvex optimization
Thank you!
Online Convex Optimization with Unbounded Memory
https://arxiv.org/abs/2210.09903
Raunak Kumar Sarah Dean Robert Kleinberg
Questions?
References:
 Agarwal, Bullins, Hazan, Kakade, Singh. "Online control with adversarial disturbances." ICML, 2019.
 Anava, Hazan, Mannor. "Online learning for adversaries with memory: Price of past mistakes." NeurIPS, 2015.
 Anderson, Doyle, Low, Matni. "System level synthesis." Annual Reviews in Control, 2019.
 Youla, Jabr, Bongiorno. "Modern WienerHopf design of optimal controllersPart II: The multivariable case." IEEE Transactions on Automatic Control, 1976.
Online Convex Optimization with Unbounded Memory
By Sarah Dean
Online Convex Optimization with Unbounded Memory
 188