## When **Online Learning** meets **Stochastic Calculus**

Victor Sanches Portella

**Computer** **Science**

### Prediction with Expert Advice

Player

Adversary

**\(n\)** Experts

0.5

0.1

0.3

0.1

Probabilities

1

0

1

0

Costs

**Expected loss:**

Player's Loss

Loss of Best Expert

**No IID assumption**

Differential Privacy

RL & Control

Comb.

Optimization

### Optimal Regret with 2 Experts

[Cover '67]

[H,L,P,R '20]

Player **knows** \(T\)

Player **doesn't know** \(T\)

**<**

Brownian Motion

Stochastic Calculus

PDEs

[G,H,**SP** '22]

**SLOW**

**FAST**

**FAST**

**Question:**

Regret with **known** \(T\)

Regret with **unknown **\(T\)

?

### Continuous Prediction with Experts' Advice

**Question:**

Regret with **known** \(T\)

Regret with **unknown **\(T\)

?

**for large \(n\)**

**Stochastic Calculus** seems helpful

Previous aproach needs \(n = 2\)

Total cost of each expert

Player's Loss

Loss of Best Expert

Usually **Worst-case**

Regret

=

[**SP**,H,L '22]

### Continuous Prediction with Experts' Advice

**Question:**

Regret with **known** \(T\)

Regret with **unknown **\(T\)

?

**for large \(n\)**

**Stochastic Calculus** seems helpful

Previous aproach needs \(n = 2\)

Player's Loss

Loss of Best Expert

**Quantile Regret Bounds!**

**Questions?**

[**SP**,H,L '22]

##
**Gaps** and **Cover's** Algorithm

## Why Online Learning?

**No IID assumption**

Spam filtering

Click prediction

Repeated Games

**Parameter-free Optimization**

AdaGrad

Coin Betting

**Applications**

**&**

**Connections**

Boosting

Combinatorial Optimization

Differential Privacy

Non-stochastic Control

Reinforcement Learning

### Simplifying Assumptions

We will look only at \(\{0,1\}\) costs

1

0

0

1

0

0

1

1

Equal costs do not affect the regret

Cover's algorithm **relies** on these assumptions **by construction**

Our alg. and analysis extends to **fractional costs**

### Gap between experts

**Thought experiment:** how much probability mass to put on each expert?

**Cumulative Loss **on round \(t\)

\(\frac{1}{2}\) is both cases seems **reasonable**!

**Takeaway:** player's decision may depend **only** on the **gap** between experts's losses

**Gap =** |42 - 20| = 22

**Worst **Expert

**Best **Expert

42

20

2

2

42

42

(and maybe on \(t\))

### Optimal Regret with 2 Experts

[Cover '67]

[H,L,P,R '20]

Player **knows** \(T\)

Player **doesn't know** \(T\)

**<**

Brownian Motion

Stochastic Calculus

PDEs

[G,H,**SP** '22]

Probability on Worst expert

Gap between the 2 Experts

**SLOW**

**FAST**

**FAST**

### Cover's Dynamic Program

**Player strategy based on gaps:**

Choice doesn't depend on the specific past costs

on the **Worst** expert

on the **Best** expert

We can compute \(V^*\) backwards in time via **DP**!

**Max regret to be suffered** at time \(t\) with gap \(g\)

\(O(T^2)\) time to compute \(V^*\)

At **round** \(t\) with **gap** \(g\)

**Max. regret for a game with \(T\) rounds**

Computing the optimal strategy \(p^*\) from \(V^*\) is easy!

Cover's DP Table

(w/ player playing optimally)

### Cover's Dynamic Program

**Player strategy based on gaps:**

Choice doesn't depend on the specific past costs

on the **Lagging** expert

on the **Leading** expert

We can compute \(V^*\) backwards in time via **DP**!

Getting an **optimal player** \(p^*\) from \(V^*\) is easy!

Max **regret-to-be-suffered** at round \(t\) with gap \(g\)

\(O(T^2)\) time to compute the table — \(O(T)\) amortized time per round

At **round** \(t\) with **gap** \(g\)

Optimal regret for 2 experts

### Connection to Random Walks

Optimal player \(p^*\) is related to Random Walks

For \(g_t\) following a Random Walk

Central Limit Theorem

Not clear if the **approximation error** affects the regret

The DP is defined only for **integer costs**!

**Lagging **expert finishes **leading**

Let's design an algorithm that is efficient and works for all costs

**Bonus:** Connections of Cover's algorithm with stochastic calculus

### Connection to Random Walks

**Theorem**

Player \(p^*\) is also connected to RWs

For \(g_t\) following a Random Walk

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

**Lagging **expert finishes **leading**

[Cover '67]

# of 0s of a Random Walk of len \(T\)

Let's design an algorithm that is efficient and works for all costs

**Bonus:** Connections of Cover's algorithm with stochastic calculus

### A Probabilistic View of Regret Bounds

Formula for the regret based on the **gaps**

Discrete stochastic integral

Moving to **continuous time**:

Random walk \(\longrightarrow\) Brownian Motion

\(g_0, \dotsc, g_t\)** **are a realization of a ** random walk**

**Useful Perspective:**

**Deterministic bound = Bound with probability 1**

### A Probabilistic View of Regret Bounds

Formula for the regret based on the **gaps**

Random walk \(\longrightarrow\) Brownian Motion

Reflected Brownian motion** (gaps)**

Conditions on the *continuous player* **\(p\)**

Continuous on \([0,T) \times \mathbb{R}\)

for all \(t \geq 0\)

### Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

**Itô's Formula:**

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret \( = R(T, |B_T|) - R(0,0)\)

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

Different from classic FTC!

**B**ackwards** H**eat** E**quation

### Stochastic Integrals and Itô's Formula

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

How to find a good \(R\)?

?

Suffices to find a player \(p\) satisfying the **BHE**

\(\approx\) Cover's solution!

**Also a solution to an ODE**

Then setting

preserves **BHE **and

### Stochastic Integrals and Itô's Formula

How to work with stochastic integrals?

**Itô's Formula:**

\(\overset{*}{\Delta} R(t, g) = 0\) everywhere

ContRegret is given by \(R(T, |B_T|)\)

**Goal:**

Find a "**potential function**" \(R\) such that

(1) **\(\partial_g R\)** is a valid continuous player

(2) \(R\) satisfies the **Backwards Heat Equation**

Different from classic FTC!

**Backwards Heat Equation**

[C-BL 06]

### A Solution Inspired by Cover's Algorithm

From **Cover's algorithm**, we have

We can find \(R(t,g)\) such that

\(\overset{*}{\Delta} R = 0\)

\(\partial_g R = Q\)

Potential \(R\) satisfying **BHE?**

Player \(Q\) satisfies the **BHE**!

**By Itô's Formula:**

(**BHE**)

### Discrete Itô's Formula

How to analyze a **discrete** algorithm coming from **stochastic calculus**?

Discrete Itô's Formula!

Discrete Derivatives

Surprisingly, we can analyze **Cover's algorithm** with discrete **Itô's formula**

**Itô's Formula**

**Discrete Itô's Formula**

### Our Results

An **Efficient** and **Optimal** Algorithm in Fixed-Time with Two Experts

**Technique**:

Solve an analogous continuous-time problem, and discretize it

[HLPR '20]

How to exploit the knowledge of \(T\)?

Discretization error needs to be analyzed carefully.

BHE seems to play a role in other problems in OL as well!

Solution based on Cover's alg

**Or inverting time in an ODE!**

We show \(\leq 1\)

\(V^*\) and \(p^*\) satisfy the **discrete BHE**!

**Insight**:

Cover's algorithm has connections to **stochastic calculus**!

### Questions?

### Known Results

**M**ultiplicative **W**eights **U**pdate method:

**Optimal** for \(n,T \to \infty\) !

If \(n\) is fixed, we **can do better**

**\(n = 2\)**

**\(n = 3\)**

**\(n = 4\)**

Player **knows **\(T\) !

**Minmax** regret in some cases:

What if \(T\) is **not known?**

Minmax regret

**\(n = 2\)**

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

They give an **efficient** algorithm!

### A Dynamic Programming View

Optimal regret (\(V^* = V_{p^*}\))

For \(g > 0\)

For \(g = 0\)

### Regret and Player in terms of the Gap

**Path-independent player:**

If

round \(t\) and gap \(g_{t-1}\) on round \(t-1\)

on the **Lagging** expert

on the **Leading** expert

Choice doesn't depend on the specific past costs

for all \(t\), then

gap on round \(t\)

A discrete analogue of a Riemann-Stieltjes integral

**A formula for the regret**

### A Dynamic Programming View

Maximum **regret-to-be-suffered** on rounds \(t+1, \dotsc, T\) when **gap on round \(t\) is \(g\)**

Path-independent player \(\implies\) \(V_p[t,g]\) depends **only** on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

Regret suffered on round \(t+1\)

Regret suffered on round \(t + 1\)

### A Dynamic Programming View

Maximum **regret-to-be-suffered** on rounds \(t+1, \dotsc, T\) if **gap at round \(t\) is \(g\)**

We can compute \(V_p\) backwards in time!

Path-independent player \(\implies\)

\(V_p[t,g]\) depends **only** on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)

We then choose \(p^*\) that minimizes \(V^*[0,0] = V_{p^*}[0,0]\)

Maximum regret of \(p\)

### A Dynamic Programming View

For \(g > 0\)

**Optimal player**

**Optimal regret** (\(V^* = V_{p^*}\))

For \(g = 0\)

For \(g > 0\)

For \(g = 0\)

### Discrete Derivatives

### Bounding the Discretization Error

**Main idea**

\(R\) satisfies the **continuous BHE**

Approximation error of the derivatives

**Lemma**

### Known and New Results

**M**ultiplicative **W**eights **U**pdate method:

**Optimal** for \(n,T \to \infty\) !

If \(n\) is fixed, we **can do better**

**Worst-case** regret for **2 experts**

Player **knows** \(T\) (**fixed-time**)

Player **doesn't** know \(T\) (**anytime**)

**Question: **

Is there an **efficient** algorithm for the **fixed-time** case?

Ideally an algorithm that works for **general costs**!

\(O(T)\) time per round

Dynamic Programming

\(\{0,1\}\) costs

\(O(1)\) time per round

Stochastic Calculus

\([0,1]\) costs

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

[Cover '67]

#### IAM - 5 Min Presentation

By Victor Sanches Portella

# IAM - 5 Min Presentation

- 169