Sarah Dean
ACC Workshop, May 2023
joint work with Raunak Kumar and Bobby Kleinberg
Online interaction
Offline (hindsight) control problem
$$ \min_{\pi} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1}=f(s_t,a_t,w_t),~~a_t=\pi(s_t) $$
loss depends on all past actions
decision variable is a function
Online Convex Optimization (OCO) with Unbounded Memory is a framework which directly addresses these challenges
Components of OCO with memory problem
Online interaction protocol
\(f_t\)
\(x_t\)
\(h_t\)
Online interaction protocol
\(f_t\)
\(x_t\)
Example: loss depends arbitrarily on \(m\) past decisions (Anava et al., 2015)
\(h_t\)
Online interaction protocol
\(f_t\)
\(x_t\)
Example: loss depends on all past decisions with \(\rho\)-discount factor
\(h_t\)
The regret of an algorithm whose decisions result in \(h_1,\dots,h_T\) is $$ R_T(\mathcal A) = \sum_{t=1}^T f_t(h_t) - \min_{x\in\mathcal X} \sum_{t=1}^T \underbrace{ f_t\left(\sum_{k=1}^t A^k B x\right)}_{\tilde f_t(x)}$$
\(f_t\)
\(x_t\)
Goal: perform well compared to the best fixed decision in hindsight
\(h_t\)
Assumptions
Definition (\(p\)-effective memory capacity): \(\displaystyle H_p = \left( \sum_{k=0}^\infty k^p \|A^k\|^p \right)^{1/p}\)
$$\min_{a} \sum_{t=1}^T c_t(s_t,a) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t $$
\(f_t\)
\(h_t\)
\(x_t\)
Algorithm: Follow-the-Regularized-Leader on \(\tilde f_t\)
lazy
^
if \(t \mod \frac{LH_p}{\tilde L}=0\), otherwise \(x_{t+1}=x_t\)
Algorithm: Follow-the-Regularized-Leader on \(\tilde f_t\)
Proof Sketch. Decompose regret into two terms $$R_T(\mathcal A) = \textstyle \sum_{t=1}^T f_t(h_t) - \sum_{t=1}^T \tilde f_t(x_t) + \sum_{t=1}^T \tilde f_t(x_t) - \min_{x\in\mathcal X} \sum_{t=1}^T \tilde f_t(x)$$
The following instance of OCO with finite memory has $$R_T(\mathcal A) \geq \Omega \left(\sqrt{T}\sqrt{H_p}\sqrt{L\tilde L}\right) = \Omega \left(\sqrt{T} m\right)\quad \forall~~\mathcal A$$
Let \(\mathcal X = [-1,1]\), finite memory \(\mathcal H = \mathcal X^m\), Rademacher samples \(w_1,\dots w_{\frac{T}{m}}\), and $$ f_t(h_t) = w_{\lceil\frac{t}{m}\rceil} m^{-1/2} (x_{t-m+1} + \dots + x_{m\lfloor\frac{t}{m}\rfloor + 1})$$
\(m\) steps in the past
time of sample
\(h_t\) and \(f_t\)
\(w_i\)
\(w_{i-1}\)
\(w_{i+1}\)
\(\underbrace{\qquad\qquad}\)
\(m\)
\(t\)
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
instead of a loop,
system looks like a line
\((F,G)\)
\(K\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X\)
\(H\)
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
\(\bf s\)
\(\bf a\)
\(\bf w\)
\(X_t\)
\(H\)
$$\min_{K} \sum_{t=1}^T c_t(s_t,a_t) \quad\text{s.t.}\quad s_{t+1} = Fs_t+Ga + w_t ,~~a_t=Ks_t$$
Online Convex Optimization with Unbounded Memory
https://arxiv.org/abs/2210.09903
Raunak Kumar Sarah Dean Robert Kleinberg
References: