When Online Learning meets Stochastic Calculus

Victor Sanches Portella

Computer Science

Prediction with Expert Advice

Player

$$n$$ Experts

\displaystyle \mathrm{Regret}(T) = \sum_{t = 1}^T \langle c_t, p_t \rangle - \min_{i = 1, \dotsc, n} \sum_{t = 1}^T c_t(i)

0.5

0.1

0.3

0.1

Probabilities

p_t

1

0

1

0

Costs

c_t

Expected loss:

\langle c_t, p_t \rangle
\displaystyle = o(T)

Player's Loss

Loss of Best Expert

No IID assumption

Differential Privacy

RL & Control

Comb.

Optimization

Optimal Regret with 2 Experts

\mathrm{Regret}(T) \leq \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]

[H,L,P,R '20]

Player knows $$T$$

Player doesn't know $$T$$

\displaystyle \mathrm{Regret}(T) \leq 0.64\sqrt{T}

<

\displaystyle \mathrm{ContRegret(T)} = \int_{0}^{T} p(t, |B_t|)\mathrm{d}|B_t|
\displaystyle \Bigg\}

Brownian Motion

Stochastic Calculus

PDEs

[G,H,SP '22]

SLOW

FAST

FAST

Question:

Regret with known $$T$$

Regret with unknown $$T$$

?

Continuous Prediction with Experts' Advice

Question:

Regret with known $$T$$

Regret with unknown $$T$$

?

for large $$n$$

Stochastic Calculus seems helpful

Previous aproach needs $$n = 2$$

\displaystyle C_1(t) = B_1(t)
\displaystyle C_2(t) =
\displaystyle C_n(t) =
\displaystyle \vdots
\displaystyle B_2(t)
\displaystyle B_n(t)
\displaystyle \vdots
\displaystyle \mathrm{ContRegret(T)} = \int_0^T \langle p(t), \mathrm{d}C(t) \rangle - \min_{i \in \{1, \dotsc, n\}} C_i(t)

Total cost of each expert

Player's Loss

Loss of Best Expert

Usually Worst-case

Regret

=

[SP,H,L '22]

Continuous Prediction with Experts' Advice

Question:

Regret with known $$T$$

Regret with unknown $$T$$

?

for large $$n$$

Stochastic Calculus seems helpful

Previous aproach needs $$n = 2$$

\mathrm{d}C_1(t) = \phantom{w_{1,1}(t)} \mathrm{d} B_1(t) + \phantom{w_{1,2}(t)} \mathrm{d} B_2(t)\; +
\displaystyle \mathrm{d}C_2(t) = \phantom{w_{2,1}(t)} \mathrm{d} B_1(t) + \phantom{w_{2,2}(t)} \mathrm{d} B_2(t) \; +
\displaystyle \mathrm{d}C_n(t) = \phantom{w_{n,1}(t)} \mathrm{d} B_1(t) + \phantom{w_{n,2}(t)} \mathrm{d} B_2(t) \; +
\displaystyle \vdots
\displaystyle + \; \phantom{w_{2,n}(t)} \mathrm{d} B_n(t)
\displaystyle + \; \phantom{w_{n,n}(t)} \mathrm{d} B_n(t)
\displaystyle \vdots
\displaystyle \mathrm{ContRegret(T)} = \int_0^T \langle p(t), \mathrm{d}C(t) \rangle - \min_{i \in \{1, \dotsc, n\}} C_i(t)

Player's Loss

Loss of Best Expert

\displaystyle \vdots
\displaystyle + \; \phantom{w_{1,n}(t)}\mathrm{d} B_n(t)
\displaystyle \vdots
\displaystyle \vdots
w_{1,1}(t)
w_{1,2}(t)
w_{1,n}(t)
w_{2,1}(t)
w_{2,2}(t)
w_{2,n}(t)
w_{n,1}(t)
w_{n,2}(t)
w_{n,n}(t)

Quantile Regret Bounds!

Questions?

[SP,H,L '22]

Why Online Learning?

No IID assumption

Spam filtering

Click prediction

Repeated Games

Parameter-free Optimization

Coin Betting

Applications

&

Connections

Boosting

Combinatorial Optimization

Differential Privacy

Non-stochastic Control

Reinforcement Learning

Simplifying Assumptions

We will look only at $$\{0,1\}$$ costs

1

0

0

1

0

0

1

1

Equal costs do not affect the regret

Cover's algorithm relies on these assumptions by construction

Our alg. and analysis extends to fractional costs

Gap between experts

Thought experiment: how much probability mass to put on each expert?

Cumulative Loss on round $$t$$

$$\frac{1}{2}$$ is both cases seems reasonable!

Takeaway: player's decision may depend only on the gap between experts's losses

Gap = |42 - 20| = 22

Worst Expert

Best Expert

42

20

2

2

42

42

(and maybe on $$t$$)

Optimal Regret with 2 Experts

\mathrm{Regret}(T) \leq \sqrt{\frac{1}{2\pi} T} + O(1)

[Cover '67]

[H,L,P,R '20]

Player knows $$T$$

Player doesn't know $$T$$

\displaystyle \mathrm{Regret}(T) \leq 0.64\sqrt{T}

<

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})
\displaystyle \Delta g_t = \pm 1
\displaystyle \Bigg\}
\displaystyle \mathrm{ContRegret(T)} = \int_{0}^{T} p(t, |B_t|)\mathrm{d}|B_t|
\displaystyle \Bigg\}

Brownian Motion

Stochastic Calculus

PDEs

[G,H,SP '22]

Probability on Worst expert

Gap between the 2 Experts

SLOW

FAST

FAST

Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Worst expert

1 - p(t, g)

on the Best expert

We can compute $$V^*$$ backwards in time via DP!

\displaystyle V^*[t, g] =

Max regret to be suffered at time $$t$$ with gap $$g$$

$$O(T^2)$$ time to compute $$V^*$$

At round $$t$$ with gap $$g$$

\displaystyle V^*[0, 0] =

Max. regret for a game with $$T$$ rounds

Computing the optimal strategy $$p^*$$ from $$V^*$$ is easy!

Cover's DP Table

(w/ player playing optimally)

Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Lagging expert

1 - p(t, g)

on the Leading expert

We can compute $$V^*$$ backwards in time via DP!

Getting an optimal player $$p^*$$ from $$V^*$$ is easy!

\displaystyle V^*[t, g] =

Max regret-to-be-suffered at round $$t$$ with gap $$g$$

$$O(T^2)$$ time to compute the table — $$O(T)$$ amortized time per round

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

At round $$t$$ with gap $$g$$

\displaystyle V^*[0, 0] =

Optimal regret for 2 experts

Connection to Random Walks

Optimal player $$p^*$$ is related to Random Walks

\displaystyle p^*(t,g)

For $$g_t$$ following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

Lagging expert finishes leading

= \mathbb{P} \Big(
\Big)

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

Connection to Random Walks

Theorem

\displaystyle \Big]
\displaystyle = \sqrt{\frac{T}{2\pi}} + O(1)

Player $$p^*$$ is also connected to RWs

\displaystyle p^*(t,g)

For $$g_t$$ following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

\displaystyle V^*[0,0] =
\displaystyle \frac{1}{2}
\displaystyle \mathbb{E}\Big[

Lagging expert finishes leading

= \mathbb{P} \Big(
\Big)

[Cover '67]

# of 0s of a Random Walk of len $$T$$

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

A Probabilistic View of Regret Bounds

Formula for the regret based on the gaps

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

Discrete stochastic integral

Moving to continuous time:

Random walk $$\longrightarrow$$ Brownian Motion

$$g_0, \dotsc, g_t$$ are a realization of a random walk

\displaystyle \Bigg\{
\displaystyle \Delta g_t = \pm 1

Useful Perspective:

Deterministic bound = Bound with probability 1

A Probabilistic View of Regret Bounds

Formula for the regret based on the gaps

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

Random walk $$\longrightarrow$$ Brownian Motion

\displaystyle \mathrm{ContRegret(p, T)} = \int_{0}^{T} p(t, |B_t|)\mathrm{d}|B_t|

Reflected Brownian motion (gaps)

Conditions on the continuous player $$p$$

Continuous on $$[0,T) \times \mathbb{R}$$

p(t,0) = \frac{1}{2}

for all $$t \geq 0$$

Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, 0) =

Itô's Formula:

$$\overset{*}{\Delta} R(t, g) = 0$$ everywhere

ContRegret $$= R(T, |B_T|) - R(0,0)$$

\displaystyle \implies

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R
\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R
\displaystyle \partial_t
\displaystyle \tfrac{1}{2}\partial_{gg}
\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

Stochastic Integrals and Itô's Formula

\displaystyle R(T, |B_T|) - R(0, 0) =

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R

How to find a good $$R$$?

?

Suffices to find a player $$p$$ satisfying the BHE

p(t,g) = \mathbb{P}(\mathcal{N}(0, T - t) > g)

$$\approx$$ Cover's solution!

Also a solution to an ODE

\displaystyle R(T, |B_T|) - R(0,0) \leq \sqrt{\frac{T}{2\pi}}
R(t,g) \approx \int p(t,g)

Then setting

preserves BHE and

p = \partial_g R

Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, |B_0|) = \int_{0}^T \partial_g R(t, |B_t|) \mathrm{d}|B_t|

Itô's Formula:

$$\overset{*}{\Delta} R(t, g) = 0$$ everywhere

ContRegret is given by $$R(T, |B_T|)$$

\displaystyle \implies

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R
\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R
\displaystyle \partial_t
\displaystyle \tfrac{1}{2}\partial_{gg}
\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

[C-BL 06]

A Solution Inspired by Cover's Algorithm

From Cover's algorithm, we have

p^*(t,g) \approx \mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)
\displaystyle \Big\} = \text{player}~Q(t, g)

We can find $$R(t,g)$$ such that

\displaystyle \Bigg\{

$$\overset{*}{\Delta} R = 0$$

$$\partial_g R = Q$$

Potential $$R$$ satisfying BHE?

Player $$Q$$ satisfies the BHE!

By Itô's Formula:

\displaystyle \mathrm{ContRegret}(Q, T) = R(T, |B_T|) - R(0,0)
\displaystyle \leq \sqrt{\frac{T}{2\pi}}

(BHE)

Discrete Itô's Formula

\displaystyle R(T, |B_T|) - R(0, 0) = \displaystyle \mathrm{ContRegret}(\partial_g R, T) + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t
\displaystyle R(T, g_T) - R(0, 0) =\;\;\; \mathrm{Regret}(R_g, T)

How to analyze a discrete algorithm coming from stochastic calculus?

Discrete Itô's Formula!

\displaystyle + \sum_{t = 1}^T \Big(R_t(t, g_t) + \frac{1}{2} R_{gg}(t, g_t)\Big)

Discrete Derivatives

Surprisingly, we can analyze Cover's algorithm with discrete Itô's formula

Itô's Formula

Discrete Itô's Formula

Our Results

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

Technique:

Solve an analogous continuous-time problem, and discretize it

[HLPR '20]

How to exploit the knowledge of $$T$$?

Discretization error needs to be analyzed carefully.

BHE seems to play a role in other problems in OL as well!

Solution based on Cover's alg

Or inverting time in an ODE!

We show $$\leq 1$$

$$V^*$$ and $$p^*$$ satisfy the discrete BHE!

Insight:

Cover's algorithm has connections to stochastic calculus!

Known Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for $$n,T \to \infty$$ !

If $$n$$ is fixed, we can do better

$$n = 2$$

$$n = 3$$

$$n = 4$$

\sqrt{\frac{T}{2\pi}} + O(1)
\sqrt{\frac{8T}{9\pi}} + O(\ln T)
\sim \sqrt{\frac{T \pi}{8}}

Player knows $$T$$ !

Minmax regret in some cases:

What if $$T$$ is not known?

\displaystyle \frac{\gamma}{2} \sqrt{T}

Minmax regret

$$n = 2$$

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

They give an efficient algorithm!

\displaystyle \gamma \approx 1.307

A Dynamic Programming View

Optimal regret ($$V^* = V_{p^*}$$)

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])
\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For $$g > 0$$

For $$g = 0$$

g
t
4
3
2
1
0
0
0
0
0
0
0
0
\frac{1}{2}
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
3

Regret and Player in terms of the Gap

Path-independent player:

If

round $$t$$ and gap $$g_{t-1}$$ on round $$t-1$$

p(t, g_{t-1})
1 - p(t, g_{t-1})

on the Lagging expert

on the Leading expert

Choice doesn't depend on the specific past costs

p(t, 0) = 1/2

for all $$t$$, then

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

gap on round $$t$$

A discrete analogue of a Riemann-Stieltjes integral

A formula for the regret

A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds $$t+1, \dotsc, T$$ when gap on round $$t$$ is $$g$$

Path-independent player $$\implies$$ $$V_p[t,g]$$ depends only on $$\ell_{t+1}, \dotsc, \ell_T$$ and $$g_t, \dotsc, g_{T}$$

\displaystyle V_p[t, 0] = \max\{p(t+1,0), 1 - p(t+1,0)\} + V_p[t+1, 1]

Regret suffered on round $$t+1$$

Regret suffered on round $$t + 1$$

\displaystyle V_p[t, g] = \max \Bigg\{
\displaystyle V_p[t+1, g+1] + p(t + 1,g)
\displaystyle V_p[t+1, g-1] - p(t + 1,g)

A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds $$t+1, \dotsc, T$$ if gap at round $$t$$ is $$g$$

We can compute $$V_p$$ backwards in time!

Path-independent player $$\implies$$

$$V_p[t,g]$$ depends only on $$\ell_{t+1}, \dotsc, \ell_T$$ and $$g_t, \dotsc, g_{T}$$

We then choose $$p^*$$ that minimizes $$V^*[0,0] = V_{p^*}[0,0]$$

\displaystyle V_p[0, 0] =

Maximum regret of $$p$$

A Dynamic Programming View

For $$g > 0$$

\displaystyle p^*(t,g) = \frac{1}{2}(V_{p^*}[t, g-1] - V_{p^*}[t, g + 1])

Optimal player

\displaystyle p^*(t,0) = \frac{1}{2}

Optimal regret ($$V^* = V_{p^*}$$)

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

For $$g = 0$$

\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For $$g > 0$$

For $$g = 0$$

Discrete Derivatives

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)
\approx \partial_g V^*[t,g]
\eqqcolon V_g^*[t,g]
V^*[t, g] - V^*[t-1, g]
= \frac{1}{2}( V^*[t, g-1] - V^*[t, g]) - \frac{1}{2}(V^*[t, g] - V^*[t,g+1])
\coloneqq V_t^*[t,g]
\coloneqq \frac{1}{2}V_{gg}^*[t,g]
\partial_t V^*[t,g] \approx
\approx \frac{1}{2}\partial_g V^*[t,g-1]
\approx \frac{1}{2} \partial_g V^*[t,g]
\approx \frac{1}{2} \partial_{gg} V^*[t,g]

Bounding the Discretization Error

Main idea

$$R$$ satisfies the continuous BHE

\implies
R_t(t,g) + \frac{1}{2} R_{gg}(t,g) \approx

Approximation error of the derivatives

\implies
\leq 0.74
\displaystyle \mathrm{Regret(T)} \leq \frac{1}{2} + \sqrt{\frac{T}{2\pi}} + \sum_{t = 1}^T O\Bigg( \frac{1}{(T - t)^{3/2}}\Bigg)

Lemma

\partial_{gg}R(t,g) - R_{gg}(t,g)
\partial_t R(t,g) - R_t(t,g)
\in O\Big( \frac{1}{(T - t)^{3/2}}\Big)
\Bigg\}

Known and New Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for $$n,T \to \infty$$ !

If $$n$$ is fixed, we can do better

Worst-case regret for 2 experts

Player knows $$T$$ (fixed-time)

Player doesn't know $$T$$ (anytime)

Question:

Is there an efficient algorithm for the fixed-time case?

Ideally an algorithm that works for general costs!

$$O(T)$$ time per round

Dynamic Programming

$$\{0,1\}$$ costs

$$O(1)$$ time per round

Stochastic Calculus

$$[0,1]$$ costs

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

\displaystyle \approx 0.65 \sqrt{T}
\displaystyle \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]