ALT 2022 - Two Experts

The Two-Experts' Problem

Prediction with Expert Advice

Player

Adversary

\(n\) Experts

\displaystyle \mathrm{Regret}(T) = \sum_{t = 1}^T \langle \ell_t, x_t \rangle - \min_{i = 1, \dotsc, n} \sum_{t = 1}^T \ell_t(i)

0.5

0.1

0.3

0.1

Probabilities

x_t

1

0

0.5

0.3

Costs

\ell_t

Player's loss:

\langle \ell_t, x_t \rangle

Loss of Best Expert

Player's Loss

Knows \(T\) (fixed-time)

Known and New Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for \(n,T \to \infty\) !

If \(n\) is fixed, we can do better

Worst-case regret for 2 experts:

Cover's Algorithm

\(O(T)\) time per round

Dynamic Programming

\(\{0,1\}\) costs

\(O(1)\) time per round

Stochastic Calculus

\([0,1]\) costs

\displaystyle \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]

Our Algorithm

Technique:

Discretize a solution to a stochastic calculus problem

[HLPR - FOCS '20]

How to exploit the knowledge of \(T\)?

We need to analyze the discretization error!

!

Online Learning

🤝

Stochastic Calculus

Our Results

Result:

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

\(O(1)\) time per round

was \(O(T)\) before

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2 \pi}} + 1.3

Holds for general costs!

Technique:

Discretize a solution to a stochastic calculus problem

[HLPR '20]

How to exploit the knowledge of \(T\)?

Non-zero discretization error!

Insight:

Cover's algorithm has connections to stochastic calculus!

This connection seems to extend to more experts and other problems in online learning in general!

Gaps and Cover's Algorithm

Simplifying Assumptions

We will look only at \(\{0,1\}\) costs

1

0

1

0

1

Equal costs do not affect the regret

Cover's algorithm relies on these assumptions by construction

Our alg. and analysis extends to fractional costs

Gap between experts

Thought experiment: how much probability mass to put on each expert?

Cumulative Loss on round \(t\)

\(\frac{1}{2}\) is both cases seems reasonable!

Takeaway: player's decision may depend only on the gap between experts's losses

Gap = |42 - 20| = 22

Worst Expert

Best Expert

42

20

2

42

(and maybe on \(t\))

Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Worst expert

1 - p(t, g)

on the Best expert

We can compute \(V^*\) backwards in time via DP!

\displaystyle V^*[t, g] =

Max regret to be suffered at time \(t\) with gap \(g\)

\(O(T^2)\) time to compute \(V^*\)

At round \(t\) with gap \(g\)

\displaystyle V^*[0, 0] =

Max. regret for a game with \(T\) rounds

Computing the optimal strategy \(p^*\) from \(V^*\) is easy!

Cover's DP Table

(w/ player playing optimally)

Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Lagging expert

1 - p(t, g)

on the Leading expert

We can compute \(V^*\) backwards in time via DP!

Getting an optimal player \(p^*\) from \(V^*\) is easy!

\displaystyle V^*[t, g] =

Max regret-to-be-suffered at round \(t\) with gap \(g\)

\(O(T^2)\) time to compute the table — \(O(T)\) amortized time per round

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

At round \(t\) with gap \(g\)

\displaystyle V^*[0, 0] =

Optimal regret for 2 experts

Connection to Random Walks

Optimal player \(p^*\) is related to Random Walks

\displaystyle p^*(t,g)

For \(g_t\) following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

Lagging expert finishes leading

= \mathbb{P} \Big(

\Big)

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

Connection to Random Walks

Theorem

\displaystyle \Big]

\displaystyle = \sqrt{\frac{T}{2\pi}} + O(1)

Player \(p^*\) is also connected to RWs

\displaystyle p^*(t,g)

For \(g_t\) following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

\displaystyle V^*[0,0] =

\displaystyle \frac{1}{2}

\displaystyle \mathbb{E}\Big[

Lagging expert finishes leading

= \mathbb{P} \Big(

\Big)

[Cover '67]

# of 0s of a Random Walk of len \(T\)

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

Continuous Regret

A Probabilistic View of Regret Bounds

Formula for the regret based on the gaps

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

Discrete stochastic integral

Moving to continuous time:

Random walk \(\longrightarrow\) Brownian Motion

\(g_0, \dotsc, g_t\) are a realization of a random walk

\displaystyle \Bigg\{

\displaystyle \Delta g_t = \pm 1

Useful Perspective:

Deterministic bound = Bound with probability 1

A Probabilistic View of Regret Bounds

Formula for the regret based on the gaps

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

Random walk \(\longrightarrow\) Brownian Motion

\displaystyle \mathrm{ContRegret(p, T)} = \int_{0}^{T} p(t, |B_t|)\mathrm{d}|B_t|

Reflected Brownian motion (gaps)

Conditions on the continuous player \(p\)

Continuous on \([0,T) \times \mathbb{R}\)

p(t,0) = \frac{1}{2}

for all \(t \geq 0\)

Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, 0) =

Itô's Formula:

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret \( = R(T, |B_T|) - R(0,0)\)

\displaystyle \implies

Goal:

Find a "potential function" \(R\) such that

(1) \(\partial_g R\) is a valid continuous player

(2) \(R\) satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle \partial_g R

\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R

\displaystyle \partial_t

\displaystyle \tfrac{1}{2}\partial_{gg}

\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

Stochastic Integrals and Itô's Formula

\displaystyle R(T, |B_T|) - R(0, 0) =

Goal:

Find a "potential function" \(R\) such that

(1) \(\partial_g R\) is a valid continuous player

(2) \(R\) satisfies the Backwards Heat Equation

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle \partial_g R

How to find a good \(R\)?

?

Suffices to find a player \(p\) satisfying the BHE

p(t,g) = \mathbb{P}(\mathcal{N}(0, T - t) > g)

\(\approx\) Cover's solution!

Also a solution to an ODE

\displaystyle R(T, |B_T|) - R(0,0) \leq \sqrt{\frac{T}{2\pi}}

R(t,g) \approx \int p(t,g)

Then setting

preserves BHE and

p = \partial_g R

Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, |B_0|) = \int_{0}^T \partial_g R(t, |B_t|) \mathrm{d}|B_t|

Itô's Formula:

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret is given by \(R(T, |B_T|)\)

\displaystyle \implies

Goal:

Find a "potential function" \(R\) such that

(1) \(\partial_g R\) is a valid continuous player

(2) \(R\) satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)

\displaystyle \partial_g R

\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R

\displaystyle \partial_t

\displaystyle \tfrac{1}{2}\partial_{gg}

\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

[C-BL 06]

A Solution Inspired by Cover's Algorithm

From Cover's algorithm, we have

p^*(t,g) \approx \mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

\displaystyle \Big\} = \text{player}~Q(t, g)

We can find \(R(t,g)\) such that

\displaystyle \Bigg\{

\(\overset{*}{\Delta} R = 0\)

\(\partial_g R = Q\)

Potential \(R\) satisfying BHE?

Player \(Q\) satisfies the BHE!

By Itô's Formula:

\displaystyle \mathrm{ContRegret}(Q, T) = R(T, |B_T|) - R(0,0)

\displaystyle \leq \sqrt{\frac{T}{2\pi}}

(BHE)

Discretization

Discrete Itô's Formula

\displaystyle R(T, |B_T|) - R(0, 0) = \displaystyle \mathrm{ContRegret}(\partial_g R, T) + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

\displaystyle R(T, g_T) - R(0, 0) =\;\;\; \mathrm{Regret}(R_g, T)

How to analyze a discrete algorithm coming from stochastic calculus?

Discrete Itô's Formula!

\displaystyle + \sum_{t = 1}^T \Big(R_t(t, g_t) + \frac{1}{2} R_{gg}(t, g_t)\Big)

Discrete Derivatives

Surprisingly, we can analyze Cover's algorithm with discrete Itô's formula

Itô's Formula

Discrete Itô's Formula

Discrete Algorithms

p^*(t,g)

= V_g^*[t,g]

V_t^*[t, g] + \frac{1}{2} V_{gg}^*[t,g] = 0

\(V^*\) satisfies the "discrete" Backwards Heat Equation!

Not Efficient

Efficient

Discrete Itô \(\implies\)

Regret of \(p^* \leq V^*[0,0]\)

BHE = Optimal?

Hopefully, \(R\) satisfies the discrete BHE

Discretized player:

R_g(t,g)

We show the total is \(\leq 1\)

Cover's strategy

Bounding the Discretization Error

In the work of Harvey et al., they had

In this fixed-time solution, we are not as lucky.

Negative discretization error!

g

t

T = 1000

We show the total discretization error is always \(\leq 1\)

Our Results

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

Technique:

Solve an analogous continuous-time problem, and discretize it

[HLPR '20]

How to exploit the knowledge of \(T\)?

Discretization error needs to be analyzed carefully.

BHE seems to play a role in other problems in OL as well!

Solution based on Cover's alg

Or inverting time in an ODE!

We show \(\leq 1\)

\(V^*\) and \(p^*\) satisfy the discrete BHE!

Insight:

Cover's algorithm has connections to stochastic calculus!

Questions?

Known Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for \(n,T \to \infty\) !

If \(n\) is fixed, we can do better

\(n = 2\)

\(n = 3\)

\(n = 4\)

\sqrt{\frac{T}{2\pi}} + O(1)

\sqrt{\frac{8T}{9\pi}} + O(\ln T)

\sim \sqrt{\frac{T \pi}{8}}

Player knows \(T\) !

Minmax regret in some cases:

What if \(T\) is not known?

\displaystyle \frac{\gamma}{2} \sqrt{T}

Minmax regret

\(n = 2\)

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

They give an efficient algorithm!

\displaystyle \gamma \approx 1.307

A Dynamic Programming View

Optimal regret (\(V^* = V_{p^*}\))

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For \(g > 0\)

For \(g = 0\)

g

t

4

3

2

1

0

\frac{1}{2}

0

1

2

3

Regret and Player in terms of the Gap

Path-independent player:

If

round \(t\) and gap \(g_{t-1}\) on round \(t-1\)

p(t, g_{t-1})

1 - p(t, g_{t-1})

on the Lagging expert

on the Leading expert

Choice doesn't depend on the specific past costs

p(t, 0) = 1/2

for all \(t\), then

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

gap on round \(t\)

A discrete analogue of a Riemann-Stieltjes integral

A formula for the regret

A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds \(t+1, \dotsc, T\) when gap on round \(t\) is \(g\)

Path-independent player \(\implies\) \(V_p[t,g]\) depends only on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

\displaystyle V_p[t, 0] = \max\{p(t+1,0), 1 - p(t+1,0)\} + V_p[t+1, 1]

Regret suffered on round \(t+1\)

Regret suffered on round \(t + 1\)

\displaystyle V_p[t, g] = \max \Bigg\{

\displaystyle V_p[t+1, g+1] + p(t + 1,g)

\displaystyle V_p[t+1, g-1] - p(t + 1,g)

A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds \(t+1, \dotsc, T\) if gap at round \(t\) is \(g\)

We can compute \(V_p\) backwards in time!

Path-independent player \(\implies\)

\(V_p[t,g]\) depends only on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

We then choose \(p^*\) that minimizes \(V^*[0,0] = V_{p^*}[0,0]\)

\displaystyle V_p[0, 0] =

Maximum regret of \(p\)

A Dynamic Programming View

For \(g > 0\)

\displaystyle p^*(t,g) = \frac{1}{2}(V_{p^*}[t, g-1] - V_{p^*}[t, g + 1])

Optimal player

\displaystyle p^*(t,0) = \frac{1}{2}

Optimal regret (\(V^* = V_{p^*}\))

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

For \(g = 0\)

\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For \(g > 0\)

For \(g = 0\)

Discrete Derivatives

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

\approx \partial_g V^*[t,g]

\eqqcolon V_g^*[t,g]

V^*[t, g] - V^*[t-1, g]

= \frac{1}{2}( V^*[t, g-1] - V^*[t, g]) - \frac{1}{2}(V^*[t, g] - V^*[t,g+1])

\coloneqq V_t^*[t,g]

\coloneqq \frac{1}{2}V_{gg}^*[t,g]

\partial_t V^*[t,g] \approx

\approx \frac{1}{2}\partial_g V^*[t,g-1]

\approx \frac{1}{2} \partial_g V^*[t,g]

\approx \frac{1}{2} \partial_{gg} V^*[t,g]

Bounding the Discretization Error

Main idea

\(R\) satisfies the continuous BHE

\implies

R_t(t,g) + \frac{1}{2} R_{gg}(t,g) \approx

Approximation error of the derivatives

\implies

\leq 0.74

\displaystyle \mathrm{Regret(T)} \leq \frac{1}{2} + \sqrt{\frac{T}{2\pi}} + \sum_{t = 1}^T O\Bigg( \frac{1}{(T - t)^{3/2}}\Bigg)

Lemma

\partial_{gg}R(t,g) - R_{gg}(t,g)

\partial_t R(t,g) - R_t(t,g)

\in O\Big( \frac{1}{(T - t)^{3/2}}\Big)

\Bigg\}

Known and New Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for \(n,T \to \infty\) !

If \(n\) is fixed, we can do better

Worst-case regret for 2 experts

Player knows \(T\) (fixed-time)

Player doesn't know \(T\) (anytime)

Question:

Is there an efficient algorithm for the fixed-time case?

Ideally an algorithm that works for general costs!

\(O(T)\) time per round

Dynamic Programming

\(\{0,1\}\) costs

\(O(1)\) time per round

Stochastic Calculus

\([0,1]\) costs

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

\displaystyle \approx 0.65 \sqrt{T}

\displaystyle \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]

Efficient and Optimal Fixed-Time Regret with Two Experts

The Two-Experts' Problem

Prediction with Expert Advice

Known and New Results

Our Results

Gaps and Cover's Algorithm

Simplifying Assumptions

Gap between experts

Cover's Dynamic Program

Cover's Dynamic Program

Connection to Random Walks

Connection to Random Walks

Continuous Regret

A Probabilistic View of Regret Bounds

A Probabilistic View of Regret Bounds

Stochastic Integrals and Itô's Formula

Stochastic Integrals and Itô's Formula

Stochastic Integrals and Itô's Formula

A Solution Inspired by Cover's Algorithm

Discretization

Discrete Itô's Formula

Discrete Algorithms

Bounding the Discretization Error

Our Results

Questions?

Known Results

A Dynamic Programming View

Regret and Player in terms of the Gap

A Dynamic Programming View

A Dynamic Programming View

A Dynamic Programming View

Discrete Derivatives

Bounding the Discretization Error

Known and New Results