**Victor Sanches Portella**

UBC - CS

June 2022

Player

Adversary

**\(n\)** Experts

0.5

0.1

0.3

0.1

Probabilities

x_t

1

0

0.5

0.3

Costs

\ell_t

**Player's loss:**

\langle \ell_t, x_t \rangle

Adversary **knows** the strategy of the player

Picking a **random expert**

vs

Picking a **probability vector**

\displaystyle
\mathrm{Regret}(T) = \sum_{t = 1}^T \langle \ell_t, x_t \rangle - \min_{i = 1, \dotsc, n} \sum_{t = 1}^T
\ell_t(i)

\displaystyle
\sum_{t = 1}^T \langle \ell_t, x_t \rangle

Total player's loss

Can be = \(T\) always

\displaystyle
\sum_{t = 1}^T \langle \ell_t, x_t \rangle - \sum_{t = 1}^T \min_{i = 1, \dotsc, n} \ell_t(i)

Compare with offline optimum

Almost the same as Attempt #1

Restrict the offline optimum

\displaystyle
\frac{\mathrm{Regret}(T) }{T} \to 0

**Attempt #1**

**Attempt #2**

**Attempt #3**

Loss of Best Expert

Player's Loss

**Goal:**

0

1

0.5

1

t = 1

1

1.5

0.5

1

t = 2

1.5

2

1

1.5

t = 3

2.5

3

2

1.5

t = 4

Player

Adversary

**Player's loss:**

x_t \in \mathcal{X}

f_t(\cdot)

**Convex**

f_t(x_t)

**The player sees \(f_t\)**

\mathcal{X} = \Delta_n

Simplex

f_t(x) = \langle \ell_t, x \rangle

Linear functions

Some usual settings:

\mathcal{X} = \mathbb{R}^n

\mathcal{X} = \mathrm{Ball}

f_t(x) = \lVert Ax - b \rVert^2

f_t(x) = - \log \langle a, x \rangle

Experts' problem

Traditional ML optimization makes **stochastic assumptions on the data**

OL **strips away** the stochastic layer

Traditional ML optimization makes **stochastic assumptions on the data**

Less assumptions \(\implies\) **Weaker guarantees**

Less assumptions \(\implies\) **More robust**

Adaptive algorithms

**AdaGrad**

**Adam**

Parameter Free algorithms

**Coin Betting**

Meta-optimization

**Idea**: Pick the best expert at each round

x_t = e_i = \begin{pmatrix}0\\ 0 \\ 1 \\ 0 \end{pmatrix}

**where \(i\) minimizes**

\displaystyle \sum_{s = 1}^t \ell_t(i)

**Can fail badly**

Player loses \(T -1\)

Best expert loses \(T/2\)

Works **very well** for quadratic losses

0

1

1

0

0

1

1

0

0

1

1

0

0

1

1

0

* picking distributions instead of best expert

\displaystyle x_{t+1} = \mathrm{Proj}_{\Delta_n} (x_t - \eta_t \ell_t)

\(\eta_t\): step-size at round \(t\)

\(\ell_t\): loss vector at round \(t\)

= \nabla f_t(x)

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{2 T n}

**Sublinear** Regret!

**Optimal** dependency on \(T\)

Can we improve the dependency on \(n\)?

**Yes, and by a lot**

\displaystyle x_{t+1}(i) \propto x_t(i) \cdot e^{- \eta \ell_t(i)}

\displaystyle x_{t+1}(i) \propto x_t(i)\cdot (1 - \eta \ell_t(i))

Normalization

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

**Exponential** improvement on \(n\)

**Optimal**

Other methods had clearer "optimization views"

Rediscovered many times in different fields

This one has an optimization view as well!

p_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t \langle\ell_t, p \rangle + R(x)

\displaystyle p \in \Delta_n

Regularizer

"Stabilizes the algorithm"

R(x) = \frac{1}{\eta 2} \lVert x \rVert_2^2

\implies

"Lazy" Gradient Descent

R(x) = \sum_i x_i \ln x_i

\implies

Multiplicative Weights Update

Good choice of \(R\) depends of the functions and the feasible set

- \nabla f_t(x_t)

X

x_t

x_{t+1}

projection

x_t

\nabla R(x_t)

X

\nabla R

- \nabla f_t(x_t)

\nabla R^{-1}

\Pi_X^R

**Bregman**

**Projection**

x_{t+1}

Regularizer

R(x) = \frac{1}{\eta 2} \lVert x \rVert_2^2

**GD:**

R(x) = \sum_i x_i \ln x_i

**MWU:**

\begin{pmatrix} -1 & 0.7 \\ 1 & -0.5 \end{pmatrix}

Payoff matrix \(A\) of row player

**Row player**

**Column player**

Strategy \(p = (0.1~~0.9)\)

Strategy \(q = (0.3~~0.7)\)

**Von Neumman min-max Theorem:**

\mathbb{E}[A_{ij}] = p^{T} A q

\displaystyle
\max_p \min_q p^T A q = \min_q \max_p p^T A q = \mathrm{OPT}

Row player

picks row \(i\) with probability \(p_i\)

Column player

picks column \(j\) with probability \(q_j\)

Row player

gets \(A_{ij}\)

Column player

gets \(-A_{ij}\)

**Main idea: **make each row of \(A\) be an expert

For \(t = 1, \dotsc, T\)

\(p_1 =\) uniform distribution

Loss vector \(\ell_t\) is the \(j\)-th col. of \(-A\)

Get \(p_{t+1}\) via Multiplicative Weights

**Thm:**

\displaystyle
\mathrm{OPT} - \tfrac{1}{T}\mathrm{Regret}(T) \leq \bar{p}^T A \bar{q} \leq \mathrm{OPT} + \tfrac{1}{T}\mathrm{Regret}(T)

**Cor:**

\displaystyle
T \geq \frac{2 \ln n}{\varepsilon^2} \implies

\displaystyle
\mathrm{OPT} - \varepsilon \leq \bar{p}^T A \bar{q} \leq \mathrm{OPT} + \epsilon

where \(j\) maximizes \(p_t^T A e_j\)

\(q_t = e_j\)

\(\bar{p} = \tfrac{1}{T} \sum_{t} p_t\)

\(\bar{q} = \tfrac{1}{T} \sum_{t} q_t\)

and

**Training set**

\displaystyle
S = \{(x_1, y_1), (x_2, y_2), \dotsc, (x_n, y_n)\}

**Hypothesis class**

\displaystyle
\mathcal{H}

of functions

\displaystyle
\mathcal{X} \to \{0,1\}

\displaystyle
x_i \in \mathcal{X}

\displaystyle
y_i \in \{0,1\}

**Weak learner:**

\displaystyle
\mathrm{WL}(p, S) = h \in \mathcal{H}

such that

\displaystyle
\mathbb{P}_{i \sim p}[h(x_i) = y_i] \geq \frac{1}{2} + \gamma

**Question:** Can we get with high probability a hypothesis* \(h^*\) such that

\displaystyle
h(x_i) \neq y_i

**only on a \(\varepsilon\)-fraction of \(S\)?**

**Generalization** follows if \(\mathcal{H}\) is simple (and other conditions)

For \(t = 1, \dotsc, T\)

\(p_1 =\) uniform distribution

\(\ell_t(i) = 1 - 2|h_t(x_i) - y_i|\)

Get \(p_{t+1}\) via Multiplicative Weights (with right step-size)

\(h_t = \mathrm{WL}(p_t, S)\)

\(\bar{h} = \mathrm{Majority}(h_1, \dotsc, h_T)\)

**Theorem**

If \(T \geq (2/\gamma^2) \ln(1/\varepsilon)\), then \(\bar{h}\) makes at most \(\varepsilon\) mistakes in \(S\)

Main ideas:

Regret only against distrb. on examples that \(\bar{h}\) errs

Due to **WL** property, loss of the player is \(\geq 2 T \gamma\)

\(\ln n\) becomes \(\ln (n/\mathrm{\# mistakes}) \)

Cost of any distribution of this type is \(\leq 0\)

\displaystyle
s

\displaystyle
t

**Goal:**

Route **as much flow as possible** from \(s\) to \(t\) while respecting the **edges' capacities**

We can compute in time \(O(|V| \cdot |E|)\)

This year there was a paper with a \(O(|E|^{1 + o(1)})\) alg...

What if we want something faster even if approx.?

**Fast Laplacian system solvers **(Spielman & Teng' 13)

We can compute **electrical flows** by solving this system

Electrical flows may not respect edge capacities!

Solves

\displaystyle
L x = b

in \(\tilde{O}(|E|)\) time

Laplacian matrix of \(G\)

**Main idea:** Use electrical flows as a "weak learner", and boost it using MWU!

**Edges = Experts**

**Cost = flow/capacity**

Solving Packing linear systems with oracle access

Approximating multicommodity flow

Approximately solve some semidefinite programs

Spectral graph sparsification

Approximating multicommodity flow

Computational complexity (QIP = PSPACE)

**MWU** regret

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

when \(T\) is known

\displaystyle
\mathrm{Regret}(T) \leq 2\sqrt{T \ln n}

when \(T\) is **not** known

\displaystyle
\Bigg\{

** anytime **

**fixed-time**

Does knowing \(T\) gives the player an advantage?

Mirror descent in anytime can fail terribly

[Huang, Harvey, **VSP**, Friedlander]

Optimum for 2 experts anytime is worse

[Harvey, Liaw, Perkins, Randhawa]

Similar techniques also work for fixed-time

[Greenstreet, Harvey, **VSP**]

Continuous-time experts show interesting behaviour

[**VSP**, Liaw, Harvey]

What happens when we randomize experts' losses?

\displaystyle
\ell_t(i) =
\begin{cases}
1 &\text{w.p.}~1/2\\
-1 &\text{w.p.}~1/2
\end{cases}

\displaystyle
L_t(i)

Culmulative loss

of expert \(i\) is a **random walk**

**independently**

Worst-case adversary:

\displaystyle
\mathbb{E}[\mathrm{Regret}(T) ] \geq \sqrt{2 T \ln n} - o(1)

Regret bound hold with **probability 1**

\displaystyle
\implies

Regret bound against **any adversary**

**Idea: **Move to continuous time to use powerful **stochastic calculus** tools

Worked very well with 2 experts

Moving to **continuous time**:

Random walk \(\longrightarrow\) Brownian Motion

\displaystyle
L_t(i) =B_i(t)

\displaystyle
B_1, \dotsc, B_n

are Independent Brownian motions

where

**Discrete time**

**Continuous time**

\displaystyle
\mathrm{Regret}(t) = A(t) - \min_{i} L_t(i)

\displaystyle
R_i(t) = A(t) - L_t(i)

\displaystyle
L_t(i) =B_i(t)

\displaystyle
L_t(i) =S_i(t)

**Cummulative loss**

**Player's cummulative loss**

\displaystyle
\sum_{t = 1}^T\langle p(t), \Delta L(t)\rangle

\displaystyle
\int_0^T\langle p(t), \mathrm{d} L(t) \rangle

\displaystyle
A(t)

\displaystyle
A(t)

**Player's loss per round **

\displaystyle
\langle p(t), \Delta L(t)\rangle

\displaystyle
\langle p(t), \ell(t)\rangle

\displaystyle
\langle p(t), \mathrm{d} L(t) \rangle

**Potential based players**

\displaystyle
p(t) \propto \nabla \Phi(t, R(t))

\displaystyle
\Phi(t, R(t)) = \ln \left( \sum_{i} e^{-\eta_t R_i(t)} \right)

\displaystyle
p(t) \propto e^{-\eta_t L_t(i)}

\displaystyle
\implies

**Regret bounds**

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

when \(T\) is known

\displaystyle
\mathrm{Regret}(T) \leq 2\sqrt{T \ln n}

when \(T\) is **not** known

** anytime **

**fixed-time**

\displaystyle
\Bigg\{

**MWU!**

**Same as discrete time!**

**Idea:** Use stochastic calculus to guide the algorithm design

**with prob. 1**

**Potential based players**

\displaystyle
p(t) \propto \nabla \Phi(t, R(t))

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{2 T \ln n}(1 + o(1))

**Matches fixed-time!**

Stochastic calculus suggests pontential that satisfy the **Backwards Heat Equation**

\;\;\;\;\Phi + \;\;\;\;\;\;\; \Phi = 0

\displaystyle
\partial_t

\displaystyle
\tfrac{1}{2}\partial_{gg}

This new algorithm has good regret!

Does not translate easily to discrete time

need to add **correlation between experts**

New algorithms with good quantile regret bounds!

**Fixed-time vs Anytime is still open!**

**Victor Sanches Portella**

UBC - CS

June 2022

We will look only at \(\{0,1\}\) costs

1

0

0

1

0

0

1

1

Equal costs do not affect the regret

Cover's algorithm **relies** on these assumptions **by construction**

Our alg. and analysis extends to **fractional costs**

**Thought experiment:** how much probability mass to put on each expert?

**Cumulative Loss **on round \(t\)

\(\frac{1}{2}\) is both cases seems **reasonable**!

**Takeaway:** player's decision may depend **only** on the **gap** between experts's losses

**Gap =** |42 - 20| = 22

**Worst **Expert

**Best **Expert

42

20

2

2

42

42

(and maybe on \(t\))

**Player strategy based on gaps:**

Choice doesn't depend on the specific past costs

p(t, g)

on the **Worst** expert

1 - p(t, g)

on the **Best** expert

We can compute \(V^*\) backwards in time via **DP**!

\displaystyle
V^*[t, g] =

**Max regret to be suffered** at time \(t\) with gap \(g\)

\(O(T^2)\) time to compute \(V^*\)

At **round** \(t\) with **gap** \(g\)

\displaystyle
V^*[0, 0] =

**Max. regret for a game with \(T\) rounds**

Computing the optimal strategy \(p^*\) from \(V^*\) is easy!

Cover's DP Table

(w/ player playing optimally)

**Player strategy based on gaps:**

Choice doesn't depend on the specific past costs

p(t, g)

on the **Lagging** expert

1 - p(t, g)

on the **Leading** expert

We can compute \(V^*\) backwards in time via **DP**!

Getting an **optimal player** \(p^*\) from \(V^*\) is easy!

\displaystyle
V^*[t, g] =

Max **regret-to-be-suffered** at round \(t\) with gap \(g\)

\(O(T^2)\) time to compute the table — \(O(T)\) amortized time per round

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

At **round** \(t\) with **gap** \(g\)

\displaystyle
V^*[0, 0] =

Optimal regret for 2 experts

Optimal player \(p^*\) is related to Random Walks

\displaystyle
p^*(t,g)

For \(g_t\) following a Random Walk

\displaystyle
\approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the **approximation error** affects the regret

The DP is defined only for **integer costs**!

**Lagging **expert finishes **leading**

= \mathbb{P} \Big(

\Big)

Let's design an algorithm that is efficient and works for all costs

**Bonus:** Connections of Cover's algorithm with stochastic calculus

**Theorem**

\displaystyle
\Big]

\displaystyle
= \sqrt{\frac{T}{2\pi}} + O(1)

Player \(p^*\) is also connected to RWs

\displaystyle
p^*(t,g)

For \(g_t\) following a Random Walk

\displaystyle
\approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

\displaystyle
V^*[0,0] =

\displaystyle
\frac{1}{2}

\displaystyle
\mathbb{E}\Big[

**Lagging **expert finishes **leading**

= \mathbb{P} \Big(

\Big)

[Cover '67]

# of 0s of a Random Walk of len \(T\)

Let's design an algorithm that is efficient and works for all costs

**Bonus:** Connections of Cover's algorithm with stochastic calculus

\displaystyle
R(T, |B_T|) - R(0, 0) = \displaystyle
\mathrm{ContRegret}(\partial_g R, T) + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

\displaystyle
R(T, g_T) - R(0, 0) =\;\;\; \mathrm{Regret}(R_g, T)

How to analyze a **discrete** algorithm coming from **stochastic calculus**?

Discrete Itô's Formula!

\displaystyle
+ \sum_{t = 1}^T \Big(R_t(t, g_t) + \frac{1}{2} R_{gg}(t, g_t)\Big)

Discrete Derivatives

Surprisingly, we can analyze **Cover's algorithm** with discrete **Itô's formula**

**Itô's Formula**

**Discrete Itô's Formula**

p^*(t,g)

= V_g^*[t,g]

V_t^*[t, g] + \frac{1}{2} V_{gg}^*[t,g] = 0

\(V^*\)** **satisfies the **"discrete" Backwards Heat Equation!**

**Not Efficient**

**Efficient**

**Discrete Itô \(\implies\)**

Regret of \(p^* \leq V^*[0,0]\)

**BHE = Optimal?**

Hopefully, **\(R\)** satisfies the **discrete BHE**

**Discretized player:**

R_g(t,g)

We show the **total** is \(\leq 1\)

Cover's strategy

In the work of Harvey et al., they had

In this fixed-time solution, we are not as lucky.

*Negative discretization error!*

g

t

T = 1000

We show the **total** discretization error is always \(\leq 1\)

Formula for the regret based on the **gaps**

\displaystyle
\mathrm{Regret(T)}
= \sum_{t = 1}^{T}
p(t, g_{t-1})(g_t - g_{t-1})

Random walk \(\longrightarrow\) Brownian Motion

\displaystyle
\mathrm{ContRegret(p, T)}
= \int_{0}^{T}
p(t, |B_t|)\mathrm{d}|B_t|

Reflected Brownian motion** (gaps)**

Conditions on the *continuous player* **\(p\)**

Continuous on \([0,T) \times \mathbb{R}\)

p(t,0) = \frac{1}{2}

for all \(t \geq 0\)

How to work with stochastic integrals?

\displaystyle
R(T, |B_T|) - R(0, 0) =

**Itô's Formula:**

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret \( = R(T, |B_T|) - R(0,0)\)

\displaystyle
\implies

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

\displaystyle
+ \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle
\mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle
\partial_g R

\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R

\displaystyle
\partial_t

\displaystyle
\tfrac{1}{2}\partial_{gg}

\displaystyle
\overset{*}{\Delta}

**B**ackwards** H**eat** E**quation

\displaystyle
R(T, |B_T|) - R(0, 0) =

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

\displaystyle
\mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle
\partial_g R

How to find a good \(R\)?

?

Suffices to find a player \(p\) satisfying the **BHE**

p(t,g) = \mathbb{P}(\mathcal{N}(0, T - t) > g)

\(\approx\) Cover's solution!

**Also a solution to an ODE**

\displaystyle R(T, |B_T|) - R(0,0) \leq \sqrt{\frac{T}{2\pi}}

R(t,g) \approx \int p(t,g)

Then setting

preserves **BHE **and

p = \partial_g R

How to work with stochastic integrals?

\displaystyle
R(T, |B_T|) - R(0, |B_0|) = \int_{0}^T \partial_g R(t, |B_t|) \mathrm{d}|B_t|

**Itô's Formula:**

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret is given by \(R(T, |B_T|)\)

\displaystyle
\implies

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

\displaystyle
+ \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle
\mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle
\partial_g R

\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R

\displaystyle
\partial_t

\displaystyle
\tfrac{1}{2}\partial_{gg}

\displaystyle
\overset{*}{\Delta}

**Backwards Heat Equation**

[C-BL 06]

An **Efficient** and **Optimal** Algorithm in Fixed-Time with Two Experts

**Technique**:

Solve an analogous continuous-time problem, and discretize it

[HLPR '20]

How to exploit the knowledge of \(T\)?

Discretization error needs to be analyzed carefully.

BHE seems to play a role in other problems in OL as well!

Solution based on Cover's alg

**Or inverting time in an ODE!**

We show \(\leq 1\)

\(V^*\) and \(p^*\) satisfy the **discrete BHE**!

**Insight**:

Cover's algorithm has connections to **stochastic calculus**!

**M**ultiplicative **W**eights **U**pdate method:

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

**Optimal** for \(n,T \to \infty\) !

If \(n\) is fixed, we **can do better**

**\(n = 2\)**

**\(n = 3\)**

**\(n = 4\)**

\sqrt{\frac{T}{2\pi}} + O(1)

\sqrt{\frac{8T}{9\pi}} + O(\ln T)

\sim \sqrt{\frac{T \pi}{8}}

Player **knows **\(T\) !

**Minmax** regret in some cases:

What if \(T\) is **not known?**

\displaystyle
\frac{\gamma}{2} \sqrt{T}

Minmax regret

**\(n = 2\)**

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

They give an **efficient** algorithm!

\displaystyle
\gamma \approx 1.307

Optimal regret (\(V^* = V_{p^*}\))

\displaystyle
V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

\displaystyle
V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For \(g > 0\)

For \(g = 0\)

g

t

4

3

2

1

0

0

0

0

0

0

0

0

\frac{1}{2}

0

0

0

0

0

0

0

0

0

0

0

0

0

1

2

3

**Path-independent player:**

If

round \(t\) and gap \(g_{t-1}\) on round \(t-1\)

p(t, g_{t-1})

1 - p(t, g_{t-1})

on the **Lagging** expert

on the **Leading** expert

Choice doesn't depend on the specific past costs

p(t, 0) = 1/2

for all \(t\), then

\displaystyle
\mathrm{Regret(T)}
= \sum_{t = 1}^{T}
p(t, g_{t-1})(g_t - g_{t-1})

gap on round \(t\)

A discrete analogue of a Riemann-Stieltjes integral

**A formula for the regret**

\displaystyle
V_p[t, g] =

Maximum **regret-to-be-suffered** on rounds \(t+1, \dotsc, T\) when **gap on round \(t\) is \(g\)**

Path-independent player \(\implies\) \(V_p[t,g]\) depends **only** on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

\displaystyle
V_p[t, 0] = \max\{p(t+1,0), 1 - p(t+1,0)\} + V_p[t+1, 1]

Regret suffered on round \(t+1\)

Regret suffered on round \(t + 1\)

\displaystyle
V_p[t, g] = \max \Bigg\{

\displaystyle
V_p[t+1, g+1] + p(t + 1,g)

\displaystyle
V_p[t+1, g-1] - p(t + 1,g)

\displaystyle
V_p[t, g] =

Maximum **regret-to-be-suffered** on rounds \(t+1, \dotsc, T\) if **gap at round \(t\) is \(g\)**

We can compute \(V_p\) backwards in time!

Path-independent player \(\implies\)

\(V_p[t,g]\) depends **only** on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

We then choose \(p^*\) that minimizes \(V^*[0,0] = V_{p^*}[0,0]\)

\displaystyle
V_p[0, 0] =

Maximum regret of \(p\)

For \(g > 0\)

\displaystyle
p^*(t,g) = \frac{1}{2}(V_{p^*}[t, g-1] - V_{p^*}[t, g + 1])

**Optimal player**

\displaystyle
p^*(t,0) = \frac{1}{2}

**Optimal regret** (\(V^* = V_{p^*}\))

\displaystyle
V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

For \(g = 0\)

\displaystyle
V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For \(g > 0\)

For \(g = 0\)

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

\approx \partial_g V^*[t,g]

\eqqcolon V_g^*[t,g]

V^*[t, g] - V^*[t-1, g]

= \frac{1}{2}( V^*[t, g-1] - V^*[t, g]) - \frac{1}{2}(V^*[t, g] - V^*[t,g+1])

\coloneqq V_t^*[t,g]

\coloneqq \frac{1}{2}V_{gg}^*[t,g]

\partial_t V^*[t,g] \approx

\approx \frac{1}{2}\partial_g V^*[t,g-1]

\approx \frac{1}{2} \partial_g V^*[t,g]

\approx \frac{1}{2} \partial_{gg} V^*[t,g]

**Main idea**

\(R\) satisfies the **continuous BHE**

\implies

R_t(t,g) + \frac{1}{2} R_{gg}(t,g) \approx

Approximation error of the derivatives

\implies

\leq 0.74

\displaystyle
\mathrm{Regret(T)} \leq \frac{1}{2} + \sqrt{\frac{T}{2\pi}} + \sum_{t = 1}^T O\Bigg( \frac{1}{(T - t)^{3/2}}\Bigg)

**Lemma**

\partial_{gg}R(t,g) - R_{gg}(t,g)

\partial_t R(t,g) - R_t(t,g)

\in O\Big( \frac{1}{(T - t)^{3/2}}\Big)

\Bigg\}

**M**ultiplicative **W**eights **U**pdate method:

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

**Optimal** for \(n,T \to \infty\) !

If \(n\) is fixed, we **can do better**

**Worst-case** regret for **2 experts**

Player **knows** \(T\) (**fixed-time**)

Player **doesn't** know \(T\) (**anytime**)

**Question: **

Is there an **efficient** algorithm for the **fixed-time** case?

Ideally an algorithm that works for **general costs**!

\(O(T)\) time per round

Dynamic Programming

\(\{0,1\}\) costs

\(O(1)\) time per round

Stochastic Calculus

\([0,1]\) costs

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

\displaystyle
\approx
0.65 \sqrt{T}

\displaystyle \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]

**Result**:

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

\(O(1)\) time per round

was \(O(T)\) before

\displaystyle
\mathrm{Regret}(T) \leq \sqrt{\frac{T}{2 \pi}} + 1.3

Holds for general costs!

**Technique**:

Discretize a solution to a stochastic calculus problem

[HLPR '20]

How to exploit the knowledge of \(T\)?

Non-zero discretization error!

**Insight**:

Cover's algorithm has connections to **stochastic calculus**!

This connection seems to extend to more experts and other problems in online learning in general!