Online Convex Optimization With Unbounded Memory
Sarah Dean
ACC Workshop, May 2023
joint work with Raunak Kumar and Bobby Kleinberg


Online interaction
- Choose an action \(a_t\) according to policy \(\pi_t\)
- State updates according to \(f\) and \(w_t\)
- Pay a cost \(c_t(s_t,a_t)\)
Motivation: Online Optimal Control

Offline (hindsight) control problem
$$ \min_{\pi} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1}=f(s_t,a_t,w_t),~~a_t=\pi(s_t) $$
loss depends on all past actions
decision variable is a function
Outline
Online Convex Optimization (OCO) with Unbounded Memory is a framework which directly addresses these challenges
- Problem Setting & Examples
- Main Results: Regret Bounds
- Applications & Conclusion
Components of OCO with memory problem
- Decision space \(\mathcal X\) is closed, convex subset of Hilbert space
- History space \(\mathcal H\) is Banach space
- Linear operators \(A:\mathcal H\to\mathcal H\) and \(B:\mathcal X\to \mathcal H\)
- Convex loss functions \(f_t:\mathcal H\to\mathbb R\)
Problem Setting
Online interaction protocol
- In round \(t=1,2,...,T\):
- Learner chooses decision \(x_t\in\mathcal X\)
- History updates as \(h_t = Ah_{t-1}+Bx_t\)
- Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)




\(f_t\)
\(x_t\)
\(h_t\)
Problem Setting
Online interaction protocol
- In round \(t=1,2,...,T\):
- Learner chooses decision \(x_t\in\mathcal X\)
- History updates as \(h_t = Ah_{t-1}+Bx_t\)
- Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)



\(f_t\)
\(x_t\)
Example: loss depends arbitrarily on \(m\) past decisions (Anava et al., 2015)
- History space \(\mathcal H = \mathcal X\times\dots\times\mathcal X = \mathcal X^m\)
- Linear operators \(A=\begin{bmatrix} & I \\ && \ddots \\ &&& I \\ &\end{bmatrix},\quad B = \begin{bmatrix} I \\ 0 \\ \vdots \end{bmatrix}\)

\(h_t\)
Problem Setting
Online interaction protocol
- In round \(t=1,2,...,T\):
- Learner chooses decision \(x_t\in\mathcal X\)
- History updates as \(h_t = Ah_{t-1}+Bx_t\)
- Learner suffers loss \(f_t(h_t)\) and observes \(f_t\)



\(f_t\)
\(x_t\)
Example: loss depends on all past decisions with \(\rho\)-discount factor
- History space \(\mathcal H\) contains \(T\) length sequences over \(\mathcal X\)
- Linear operators $$A(x_0, x_1, \dots)=(0, \rho x_0, \rho x_1, \dots ),\quad B x = (x,0,\dots )$$

\(h_t\)
The regret of an algorithm whose decisions result in \(h_1,\dots,h_T\) is $$ R_T(\mathcal A) = \sum_{t=1}^T f_t(h_t) - \min_{x\in\mathcal X} \sum_{t=1}^T \underbrace{ f_t\left(\sum_{k=1}^t A^k B x\right)}_{\tilde f_t(x)}$$
Regret Minimization



\(f_t\)
\(x_t\)
Goal: perform well compared to the best fixed decision in hindsight

\(h_t\)
Assumptions
- Learner observes the function \(f_t\) after each round, knows \(A\) and \(B\), and \(\|B\|=1\)
- Functions \(f_t\) are differentiable, \(L\)-Lipschitz continuous, and convex
- implies that \(\tilde f_t = f_t\circ \sum_{k=1}^t A^k B \) are diff'ble, convex, Lipschitz with \(\tilde L \leq L\sum_{k=0}^\infty \|A^k \|\)
Assumptions & Definitions
Definition (\(p\)-effective memory capacity): \(\displaystyle H_p = \left( \sum_{k=0}^\infty k^p \|A^k\|^p \right)^{1/p}\)
- Bounds distance in history resulting from decisions whose distance grows at most linearly with time
$$\min_{a} \sum_{t=1}^T c_t(s_t,a) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t $$
- History combines "noiseless" state with action $$\bar s_{t+1} = F\bar s_t + G a_t,\quad h_t = \begin{bmatrix} \bar s_t \\a_t \end{bmatrix}$$
- Linear operators defined by dynamics $$A=\begin{bmatrix} F\\ & \end{bmatrix},\quad B=\begin{bmatrix}G\\ I\end{bmatrix}$$
- Loss functions defined by cost & disturbance $$f_t(h_t) = c_t\left (\bar s_t + \sum_{k=1}^t F^k G w_{t-k}, a_t\right )=c_t(s_t, a_t)$$
Example: constant input linear control
-
Theorem: there are algorithms such that the regret of an OCO with unbounded memory problem is at most $$O\left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) $$
- effective memory capacity & Lipschitz constants
- Theorem: there exists an OCO with unbounded memory problem with regret at least $$\Omega \left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) $$
Main Results

\(f_t\)

\(h_t\)
\(x_t\)
- total loss from playing \(x\) every round
- strongly convex regularizer
- step size \(\eta = (T\tilde L(LH_p+\tilde L))^{-1/2} \)
Upper Bound
Algorithm: Follow-the-Regularized-Leader on \(\tilde f_t\)
- For \(t=1,\dots,T\) $$x_{t+1} = \min_{x\in\mathcal X} \sum_{k=1}^t \tilde f_k(x) + \frac{R(x)}{\eta} $$
lazy
^
if \(t \mod \frac{LH_p}{\tilde L}=0\), otherwise \(x_{t+1}=x_t\)
Upper Bound
Algorithm: Follow-the-Regularized-Leader on \(\tilde f_t\)
- For \(t=1,\dots,T\) $$x_{t+1} = \min_{x\in\mathcal X} \sum_{k=1}^t \tilde f_k(x) + \frac{R(x)}{\eta} $$
Proof Sketch. Decompose regret into two terms $$R_T(\mathcal A) = \textstyle \sum_{t=1}^T f_t(h_t) - \sum_{t=1}^T \tilde f_t(x_t) + \sum_{t=1}^T \tilde f_t(x_t) - \min_{x\in\mathcal X} \sum_{t=1}^T \tilde f_t(x)$$
- Standard OCO with FTRL: \(\eta^{-1} + \eta T\tilde L^2\)
- Actual vs. idealized history: \(\eta T L \tilde L H_p\)
The following instance of OCO with finite memory has $$R_T(\mathcal A) \geq \Omega \left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) = \Omega \left(\sqrt{T} m\right)\quad \forall~~\mathcal A$$
Lower Bound
Let \(\mathcal X = [-1,1]\), finite memory \(\mathcal H = \mathcal X^m\), Rademacher samples \(w_1,\dots w_{\frac{T}{m}}\), and $$ f_t(h_t) = w_{\lceil\frac{t}{m}\rceil} m^{-1/2} (x_{t-m+1} + \dots + x_{m\lfloor\frac{t}{m}\rfloor + 1})$$
\(m\) steps in the past
time of sample
\(h_t\) and \(f_t\)
\(w_i\)
\(w_{i-1}\)
\(w_{i+1}\)
\(\underbrace{\qquad\qquad}\)
\(m\)
\(t\)

Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
- Convex lifting to disturbance-action controllers (Youla et al., 1976; Anderson et al., 2019; Agarwal et al., 2019) $$ \textstyle a_t = \sum_{k=1}^{t+1}M^{[k]} w_{t-k},\quad X_t = (M^{[k]})_{k\in[t]}$$
- History contains (weighted) sequences of controllers $$H_t = (X_t, GX_{t-1}, FGX_{t-2},F^2 GX_{t-3},\dots )$$
instead of a loop,
system looks like a line
\((F,G)\)
\(K\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X\)
\(H\)
Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
- Convex lifting to disturbance-action controllers (Youla et al., 1976; Anderson et al., 2019; Agarwal et al., 2019) $$ \textstyle a_t = \sum_{k=1}^{t+1}M^{[k]} w_{t-k},\quad X_t = (M^{[k]})_{k\in[t]}$$
- History contains (weighted) sequences of controllers $$H_t = (X_t, GX_{t-1}, FGX_{t-2},F^2 GX_{t-3},\dots )$$
- Linear operators defined by linear dynamics $$A\left((Y_0,Y_1,\dots)\right) = (0, GY_0, F Y_1,\dots),\quad BX=(X,0,.\dots)$$
- States & actions are linear in history & decisions, so loss functions are defined by cost & disturbance $$f_t(H_t) = c_t\big ( \underbrace{\langle H_t, w_{1:t} \rangle}_{(s_t,a_t)}\big)=c_t(s_t, a_t)$$
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X_t\)
\(H\)
Application: Online Linear Control
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
- Approach: translate usual control assumptions on dynamics, controllers, and costs into the quantities
\(H_2\), \(L\), and \(\tilde L\) - No truncation analysis is necessary
- Improve existing upper bounds (Agarwal et al., 2019) by a factor of dimension and stability radius
Conclusion & Discussion
- Key takeaways for OCO with Unbounded Memory
- General framework which captures online control
- Matching upper & (worst case) lower regret bounds highlight fundamental quantity: effective memory capacity
- Open directions for future work
- Unknown dynamics \(A\) and \(B\)
- "Bandit" feedback of \(f_t(h_t)\)
- Nonlinearly evolving history
- Nonconvex optimization
Thank you!
Online Convex Optimization with Unbounded Memory
https://arxiv.org/abs/2210.09903
Raunak Kumar Sarah Dean Robert Kleinberg


Questions?
References:
- Agarwal, Bullins, Hazan, Kakade, Singh. "Online control with adversarial disturbances." ICML, 2019.
- Anava, Hazan, Mannor. "Online learning for adversaries with memory: Price of past mistakes." NeurIPS, 2015.
- Anderson, Doyle, Low, Matni. "System level synthesis." Annual Reviews in Control, 2019.
- Youla, Jabr, Bongiorno. "Modern Wiener-Hopf design of optimal controllers--Part II: The multivariable case." IEEE Transactions on Automatic Control, 1976.
Online Convex Optimization with Unbounded Memory
By Sarah Dean
Online Convex Optimization with Unbounded Memory
- 188