## Online Learning via Prediction with Experts' Advice

Victor Sanches Portella

UBC - CS

June 2022

## Experts' Problem and Online Learning

Player

$$n$$ Experts

0.5

0.1

0.3

0.1

Probabilities

x_t

1

0

0.5

0.3

Costs

\ell_t

Player's loss:

\langle \ell_t, x_t \rangle

Adversary knows the strategy of the player

Picking a random expert

vs

Picking a probability vector

### Measuring Player's Perfomance

\displaystyle \mathrm{Regret}(T) = \sum_{t = 1}^T \langle \ell_t, x_t \rangle - \min_{i = 1, \dotsc, n} \sum_{t = 1}^T \ell_t(i)
\displaystyle \sum_{t = 1}^T \langle \ell_t, x_t \rangle

Total player's loss

Can be = $$T$$ always

\displaystyle \sum_{t = 1}^T \langle \ell_t, x_t \rangle - \sum_{t = 1}^T \min_{i = 1, \dotsc, n} \ell_t(i)

Compare with offline optimum

Almost the same as Attempt #1

Restrict the offline optimum

\displaystyle \frac{\mathrm{Regret}(T) }{T} \to 0

Attempt #1

Attempt #2

Attempt #3

Loss of Best Expert

Player's Loss

Goal:

0
1
0.5
1
t = 1
1
1.5
0.5
1
t = 2
1.5
2
1
1.5
t = 3
2.5
3
2
1.5
t = 4

### General Online Learning

Player

Player's loss:

x_t \in \mathcal{X}
f_t(\cdot)

Convex

f_t(x_t)

The player sees $$f_t$$

\mathcal{X} = \Delta_n

Simplex

f_t(x) = \langle \ell_t, x \rangle

Linear functions

Some usual settings:

\mathcal{X} = \mathbb{R}^n
\mathcal{X} = \mathrm{Ball}
f_t(x) = \lVert Ax - b \rVert^2
f_t(x) = - \log \langle a, x \rangle

Experts' problem

### Why Online Learning?

Traditional ML optimization makes stochastic assumptions on the data

OL Â strips awayÂ  the stochastic layer

Traditional ML optimization makes stochastic assumptions on the data

Less assumptions $$\implies$$ Weaker guarantees

Less assumptions $$\implies$$ More robust

Parameter Free algorithms

Coin Betting

Meta-optimization

## Algorithms

Idea: Pick the best expert at each round

x_t = e_i = \begin{pmatrix}0\\ 0 \\ 1 \\ 0 \end{pmatrix}

where $$i$$ minimizes

\displaystyle \sum_{s = 1}^t \ell_t(i)

Player loses $$T -1$$

Best expert loses $$T/2$$

Works very well for quadratic losses

0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0

* picking distributions instead of best expert

\displaystyle x_{t+1} = \mathrm{Proj}_{\Delta_n} (x_t - \eta_t \ell_t)

$$\eta_t$$: step-size at round $$t$$

$$\ell_t$$: loss vector at round $$t$$

= \nabla f_t(x)
\displaystyle \mathrm{Regret}(T) \leq \sqrt{2 T n}

Sublinear Regret!

Optimal dependency on $$T$$

Can we improve the dependency on $$n$$?

Yes, and by a lot

### Multiplicative Weights Update Method

\displaystyle x_{t+1}(i) \propto x_t(i) \cdot e^{- \eta \ell_t(i)}
\displaystyle x_{t+1}(i) \propto x_t(i)\cdot (1 - \eta \ell_t(i))

Normalization

\displaystyle \mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

Exponential improvement on $$n$$

Optimal

Other methods had clearer "optimization views"

Rediscovered many times in different fields

This one has an optimization view as well!

p_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t \langle\ell_t, p \rangle + R(x)
\displaystyle p \in \Delta_n

Regularizer

"Stabilizes the algorithm"

R(x) = \frac{1}{\eta 2} \lVert x \rVert_2^2
\implies

R(x) = \sum_i x_i \ln x_i
\implies

Multiplicative Weights Update

Good choice of $$R$$ depends of the functions and the feasible set

### Online Mirror Descent

- \nabla f_t(x_t)
X
x_t
x_{t+1}

projection

x_t
\nabla R(x_t)

### Primal

X
\nabla R
- \nabla f_t(x_t)
\nabla R^{-1}
\Pi_X^R

Bregman

Projection

x_{t+1}

Regularizer

R(x) = \frac{1}{\eta 2} \lVert x \rVert_2^2

GD:

R(x) = \sum_i x_i \ln x_i

MWU:

## Applications

### Approximately Solving Zero-Sum Games

\begin{pmatrix} -1 & 0.7 \\ 1 & -0.5 \end{pmatrix}

Payoff matrix $$A$$ of row player

Row player

Column player

Strategy $$pÂ = (0.1~~0.9)$$

Strategy $$qÂ = (0.3~~0.7)$$

Von Neumman min-max Theorem:

\mathbb{E}[A_{ij}] = p^{T} A q
\displaystyle \max_p \min_q p^T A q = \min_q \max_p p^T A q = \mathrm{OPT}

Row player

picks row $$i$$ with probability $$p_i$$

Column player

picks column $$j$$ with probability $$q_j$$

Row player

gets $$A_{ij}$$

Column player

gets $$-A_{ij}$$

### Approximately Solving Zero-Sum Games

Main idea: make each row of $$A$$ be an expert

For $$t = 1, \dotsc, T$$

$$p_1 =$$ uniform distribution

Loss vector $$\ell_t$$ is the $$j$$-th col. of $$-A$$

Get $$p_{t+1}$$ via Multiplicative Weights

Thm:

\displaystyle \mathrm{OPT} - \tfrac{1}{T}\mathrm{Regret}(T) \leq \bar{p}^T A \bar{q} \leq \mathrm{OPT} + \tfrac{1}{T}\mathrm{Regret}(T)

Cor:

\displaystyle T \geq \frac{2 \ln n}{\varepsilon^2} \implies
\displaystyle \mathrm{OPT} - \varepsilon \leq \bar{p}^T A \bar{q} \leq \mathrm{OPT} + \epsilon

where $$j$$ maximizes $$p_t^T A e_j$$

$$q_t = e_j$$

$$\bar{p} = \tfrac{1}{T} \sum_{t} p_t$$

$$\bar{q} = \tfrac{1}{T} \sum_{t} q_t$$

and

### Boosting

Training set

\displaystyle S = \{(x_1, y_1), (x_2, y_2), \dotsc, (x_n, y_n)\}

Hypothesis class

\displaystyle \mathcal{H}

of functions

\displaystyle \mathcal{X} \to \{0,1\}
\displaystyle x_i \in \mathcal{X}
\displaystyle y_i \in \{0,1\}

Weak learner:

\displaystyle \mathrm{WL}(p, S) = h \in \mathcal{H}

such that

\displaystyle \mathbb{P}_{i \sim p}[h(x_i) = y_i] \geq \frac{1}{2} + \gamma

Question: Can we get with high probability a hypothesis* $$h^*$$ such that

\displaystyle h(x_i) \neq y_i

only on a $$\varepsilon$$-fraction of $$S$$?

Generalization follows if $$\mathcal{H}$$ is simple (and other conditions)

### Boosting

For $$t = 1, \dotsc, T$$

$$p_1 =$$ uniform distribution

Â $$\ell_t(i) = 1 - 2|h_t(x_i) - y_i|$$

Get $$p_{t+1}$$ via Multiplicative Weights (with right step-size)

$$h_t = \mathrm{WL}(p_t, S)$$

$$\bar{h} = \mathrm{Majority}(h_1, \dotsc, h_T)$$

Theorem

If $$T \geq (2/\gamma^2) \ln(1/\varepsilon)$$, then $$\bar{h}$$ makes at most $$\varepsilon$$ mistakes in $$S$$

Main ideas:

Regret only against distrb. on examples that $$\bar{h}$$ errs

Due to WL property, loss of the player is $$\geq 2 T \gamma$$

$$\ln n$$ becomes $$\ln (n/\mathrm{\# mistakes})$$

Cost of any distribution of this type is $$\leq 0$$

### From Electrical Flows to Maximum Flow

\displaystyle s
\displaystyle t

Goal:

Route as much flow as possible from $$s$$ to $$t$$ while respecting the edges' capacities

We can compute in time $$O(|V| \cdot |E|)$$

This year there was a paper with a $$O(|E|^{1 + o(1)})$$ alg...

What if we want something faster even if approx.?

### From Electrical Flows to Maximum Flow

Fast Laplacian system solvers (Spielman & Teng' 13)

We can compute electrical flows by solving this system

Electrical flows may not respect edge capacities!

Solves

\displaystyle L x = b

in $$\tilde{O}(|E|)$$ time

Laplacian matrix of $$G$$

Main idea: Use electrical flows as a "weak learner", and boost it using MWU!

Edges = Experts

Cost = flow/capacity

### Other applications (beyond experts)

Solving Packing linear systems with oracle access

Approximating multicommodity flow

Approximately solve some semidefinite programs

Spectral graph sparsification

Approximating multicommodity flow

Computational complexity (QIP = PSPACE)

## Anytime Regret and Stochastic Calculus

### Fixed-time vs Anytime

MWU regret

\displaystyle \mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

when $$T$$ is known

\displaystyle \mathrm{Regret}(T) \leq 2\sqrt{T \ln n}

when $$T$$ is not known

\displaystyle \Bigg\{

Â anytimeÂ

fixed-time

Does knowing $$T$$ gives the player an advantage?

Mirror descent in anytime can fail terribly

[Huang, Harvey, VSP, Friedlander]

Optimum for 2 experts anytime is worse

[Harvey, Liaw, Perkins, Randhawa]

Similar techniques also work for fixed-time

[Greenstreet, Harvey, VSP]

Continuous-time experts show interesting behaviour

[VSP, Liaw, Harvey]

### Random Experts' Costs

What happens when we randomize experts' losses?

\displaystyle \ell_t(i) = \begin{cases} 1 &\text{w.p.}~1/2\\ -1 &\text{w.p.}~1/2 \end{cases}
\displaystyle L_t(i)

Culmulative loss

of expert $$i$$ is a random walk

independently

\displaystyle \mathbb{E}[\mathrm{Regret}(T) ] \geq \sqrt{2 T \ln n} - o(1)

Regret bound hold with probability 1

\displaystyle \implies

Idea: Move to continuous time to use powerfulÂ  stochastic calculus tools

Worked very well with 2 experts

### Moving to Continuous Time

Moving to continuous time:

Random walk $$\longrightarrow$$ Brownian Motion

\displaystyle L_t(i) =B_i(t)
\displaystyle B_1, \dotsc, B_n

are Independent Brownian motions

where

### Moving to Continuous Time

Discrete time

Continuous time

\displaystyle \mathrm{Regret}(t) = A(t) - \min_{i} L_t(i)
\displaystyle R_i(t) = A(t) - L_t(i)
\displaystyle L_t(i) =B_i(t)
\displaystyle L_t(i) =S_i(t)

Cummulative loss

Player's cummulative loss

\displaystyle \sum_{t = 1}^T\langle p(t), \Delta L(t)\rangle
\displaystyle \int_0^T\langle p(t), \mathrm{d} L(t) \rangle
\displaystyle A(t)
\displaystyle A(t)

Player's loss per roundÂ

\displaystyle \langle p(t), \Delta L(t)\rangle
\displaystyle \langle p(t), \ell(t)\rangle
\displaystyle \langle p(t), \mathrm{d} L(t) \rangle

### MWU in Continuous Time

Potential based players

\displaystyle p(t) \propto \nabla \Phi(t, R(t))
\displaystyle \Phi(t, R(t)) = \ln \left( \sum_{i} e^{-\eta_t R_i(t)} \right)
\displaystyle p(t) \propto e^{-\eta_t L_t(i)}
\displaystyle \implies

Regret bounds

\displaystyle \mathrm{Regret}(T) \leq \sqrt{2 T \ln n}

when $$T$$ is known

\displaystyle \mathrm{Regret}(T) \leq 2\sqrt{T \ln n}

when $$T$$ is not known

Â anytimeÂ

fixed-time

\displaystyle \Bigg\{

MWU!

Same as discrete time!

Idea: Use stochastic calculus to guide the algorithm design

with prob. 1

### New Algorithms in Continuous Time

Potential based players

\displaystyle p(t) \propto \nabla \Phi(t, R(t))
\displaystyle \mathrm{Regret}(T) \leq \sqrt{2 T \ln n}(1 + o(1))

Matches fixed-time!

Stochastic calculus suggests pontential that satisfy the Backwards Heat Equation

\;\;\;\;\Phi + \;\;\;\;\;\;\; \Phi = 0
\displaystyle \partial_t
\displaystyle \tfrac{1}{2}\partial_{gg}

This new algorithm has good regret!

Does not translate easily to discrete time

Â need to add correlation between experts

New algorithms with good quantile regret bounds!

Fixed-time vs Anytime is still open!

## Online Learning via Prediction with Experts' Advice

Victor Sanches Portella

UBC - CS

June 2022

## Kitchen Sink

### Simplifying Assumptions

We will look only at $$\{0,1\}$$ costs

1

0

0

1

0

0

1

1

Equal costs do not affect the regret

Cover's algorithm relies on these assumptions by construction

Our alg. and analysis extends to fractional costs

### Gap between experts

Thought experiment: how much probability mass to put on each expert?

Cumulative Loss on round $$t$$

$$\frac{1}{2}$$ is both cases seems reasonable!

Takeaway: player's decision may depend only on the gap between experts's losses Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â  Â

Gap = |42 - 20| = 22

Worst Expert

Best Expert

42

20

2

2

42

42

(and maybe on $$t$$)

### Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Worst expert

1 - p(t, g)

on the Best expert

Â We can compute $$V^*$$ backwards in time via DP!

\displaystyle V^*[t, g] =

Max regret to be suffered at time $$t$$ with gap $$g$$

$$O(T^2)$$ time to compute $$V^*$$

At round $$t$$ with gap $$g$$

\displaystyle V^*[0, 0] =

Max. regret for a game with $$T$$ rounds

Computing the optimal strategy $$p^*$$ from $$V^*$$ is easy!

Cover's DP Table

(w/ player playing optimally)

### Cover's Dynamic Program

Player strategy based on gaps:

Choice doesn't depend on the specific past costs

p(t, g)

on the Lagging expert

1 - p(t, g)

Â We can compute $$V^*$$ backwards in time via DP!

Getting an optimal player $$p^*$$ from $$V^*$$ is easy!

\displaystyle V^*[t, g] =

Max regret-to-be-suffered at round $$t$$ with gap $$g$$

$$O(T^2)$$ time to compute the table â€” $$O(T)$$ amortized time per round

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)

At round $$t$$ with gap $$g$$

\displaystyle V^*[0, 0] =

Optimal regret for 2 experts

### Connection to Random Walks

Optimal player $$p^*$$ is related to Random Walks

\displaystyle p^*(t,g)

For $$g_t$$ following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

= \mathbb{P} \Big(
\Big)

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

### Connection to Random Walks

Theorem

\displaystyle \Big]
\displaystyle = \sqrt{\frac{T}{2\pi}} + O(1)

Player $$p^*$$ is also connected to RWs

\displaystyle p^*(t,g)

For $$g_t$$ following a Random Walk

\displaystyle \approx\mathbb{P}\Big(\mathcal{N}(0,T - t) > g\Big)

Central Limit Theorem

Not clear if the approximation error affects the regret

The DP is defined only for integer costs!

\displaystyle V^*[0,0] =
\displaystyle \frac{1}{2}
\displaystyle \mathbb{E}\Big[

= \mathbb{P} \Big(
\Big)

[Cover '67]

# of 0s of a Random Walk of len $$T$$

Let's design an algorithm that is efficient and works for all costs

Bonus: Connections of Cover's algorithm with stochastic calculus

### Discrete ItÃ´'s Formula

\displaystyle R(T, |B_T|) - R(0, 0) = \displaystyle \mathrm{ContRegret}(\partial_g R, T) + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t
\displaystyle R(T, g_T) - R(0, 0) =\;\;\; \mathrm{Regret}(R_g, T)

How to analyze a discrete algorithm coming from stochastic calculus?

Discrete ItÃ´'s Formula!

\displaystyle + \sum_{t = 1}^T \Big(R_t(t, g_t) + \frac{1}{2} R_{gg}(t, g_t)\Big)

Discrete Derivatives

Surprisingly, we can analyze Cover's algorithm with discrete ItÃ´'s formula

ItÃ´'s Formula

Discrete ItÃ´'s Formula

### Discrete Algorithms

p^*(t,g)
= V_g^*[t,g]
V_t^*[t, g] + \frac{1}{2} V_{gg}^*[t,g] = 0

$$V^*$$ satisfies the "discrete" Backwards Heat Equation!

Not Efficient

Efficient

Discrete ItÃ´ $$\implies$$

Regret of $$p^* \leq V^*[0,0]$$

BHE = Optimal?

Hopefully, $$R$$ satisfies the discrete BHE

Discretized player:

R_g(t,g)

We show the total is $$\leq 1$$

Cover's strategy

### Bounding the Discretization Error

In the work of Harvey et al., they had

In this fixed-time solution, we are not as lucky.

Negative discretization error!

g
t
T = 1000

We show the total discretization error is always $$\leq 1$$

### A Probabilistic View of Regret Bounds

Formula for the regret based on the gaps

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

Random walk $$\longrightarrow$$ Brownian Motion

\displaystyle \mathrm{ContRegret(p, T)} = \int_{0}^{T} p(t, |B_t|)\mathrm{d}|B_t|

Reflected Brownian motion (gaps)

Conditions on the continuous player $$p$$

Continuous on $$[0,T) \times \mathbb{R}$$

p(t,0) = \frac{1}{2}

for all $$t \geq 0$$

### Stochastic Integrals and ItÃ´'s Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, 0) =

ItÃ´'s Formula:

$$\overset{*}{\Delta} R(t, g) = 0$$ everywhere

Â ContRegret $$= R(T, |B_T|) - R(0,0)$$

\displaystyle \implies

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R
\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R
\displaystyle \partial_t
\displaystyle \tfrac{1}{2}\partial_{gg}
\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

### Stochastic Integrals and ItÃ´'s Formula

\displaystyle R(T, |B_T|) - R(0, 0) =

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R

How to find a good $$R$$?

?

Suffices to find a player $$p$$ satisfying the BHE

p(t,g) = \mathbb{P}(\mathcal{N}(0, T - t) > g)

$$\approx$$ Cover's solution!

Also a solution to an ODE

\displaystyle R(T, |B_T|) - R(0,0) \leq \sqrt{\frac{T}{2\pi}}
R(t,g) \approx \int p(t,g)

Then setting

preserves BHE and

p = \partial_g R

### Stochastic Integrals and ItÃ´'s Formula

How to work with stochastic integrals?

\displaystyle R(T, |B_T|) - R(0, |B_0|) = \int_{0}^T \partial_g R(t, |B_t|) \mathrm{d}|B_t|

ItÃ´'s Formula:

$$\overset{*}{\Delta} R(t, g) = 0$$ everywhere

Â ContRegret is given by $$R(T, |B_T|)$$

\displaystyle \implies

Goal:

Find a "potential function" $$R$$ such that

(1) $$\partial_g R$$ is a valid continuous player

(2) $$R$$ satisfies the Backwards Heat Equation

\displaystyle + \int_{0}^T \overset{*}{\Delta} R(t, |B_t|) \mathrm{d}t

Different from classic FTC!

\displaystyle \mathrm{ContRegret}(\;\;\;\;\;\;, T)
\displaystyle \partial_g R
\;\;\; \vphantom{\overset{*}{\Delta}} R = \;\;\;R + \;\;\;\;\;\;\; R
\displaystyle \partial_t
\displaystyle \tfrac{1}{2}\partial_{gg}
\displaystyle \overset{*}{\Delta}

Backwards Heat Equation

[C-BL 06]

### Our Results

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

Technique:

Solve an analogous continuous-time problem, and discretize it

[HLPR '20]

How to exploit the knowledge of $$T$$?

Discretization error needs to be analyzed carefully.

BHE seems to play a role in other problems in OL as well!

Solution based on Cover's alg

Or inverting time in an ODE!

We show $$\leq 1$$

$$V^*$$ and $$p^*$$ satisfy the discrete BHE!

Insight:

Cover's algorithm has connections to stochastic calculus!

### Known Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for $$n,T \to \infty$$ !

If $$n$$ is fixed, we can do better

$$n = 2$$

$$n = 3$$

$$n = 4$$

\sqrt{\frac{T}{2\pi}} + O(1)
\sqrt{\frac{8T}{9\pi}} + O(\ln T)
\sim \sqrt{\frac{T \pi}{8}}

Player knows $$T$$ !

Minmax regret in some cases:

What if $$T$$ is not known?

\displaystyle \frac{\gamma}{2} \sqrt{T}

Minmax regret

$$n = 2$$

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

They give an efficient algorithm!

\displaystyle \gamma \approx 1.307

### A Dynamic Programming View

Optimal regret ($$V^* = V_{p^*}$$)

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])
\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For $$g > 0$$

For $$g = 0$$

g
t
4
3
2
1
0
0
0
0
0
0
0
0
\frac{1}{2}
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
3

### Regret and Player in terms of the Gap

Path-independent player:

If

round $$t$$ and gap $$g_{t-1}$$ on round $$t-1$$

p(t, g_{t-1})
1 - p(t, g_{t-1})

on the Lagging expert

Choice doesn't depend on the specific past costs

p(t, 0) = 1/2

for all $$t$$, then

\displaystyle \mathrm{Regret(T)} = \sum_{t = 1}^{T} p(t, g_{t-1})(g_t - g_{t-1})

gap on round $$t$$

A discrete analogue of a Riemann-Stieltjes integral

A formula for the regret

### A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds $$t+1, \dotsc, T$$ when gap on round $$t$$ is $$g$$

Path-independent player $$\implies$$ $$V_p[t,g]$$ depends only on $$\ell_{t+1}, \dotsc, \ell_T$$ and $$g_t, \dotsc, g_{T}$$

\displaystyle V_p[t, 0] = \max\{p(t+1,0), 1 - p(t+1,0)\} + V_p[t+1, 1]

Regret suffered on round $$t+1$$

Regret suffered on round $$t + 1$$

\displaystyle V_p[t, g] = \max \Bigg\{
\displaystyle V_p[t+1, g+1] + p(t + 1,g)
\displaystyle V_p[t+1, g-1] - p(t + 1,g)

### A Dynamic Programming View

\displaystyle V_p[t, g] =

Maximum regret-to-be-suffered on rounds $$t+1, \dotsc, T$$ if gap at round $$t$$ is $$g$$

Â We can compute $$V_p$$ backwards in time!

Path-independent player $$\implies$$

Â $$V_p[t,g]$$ depends only on $$\ell_{t+1}, \dotsc, \ell_T$$ and $$g_t, \dotsc, g_{T}$$

We then choose $$p^*$$ that minimizes $$V^*[0,0] = V_{p^*}[0,0]$$

\displaystyle V_p[0, 0] =

Maximum regret of $$p$$

### A Dynamic Programming View

For $$g > 0$$

\displaystyle p^*(t,g) = \frac{1}{2}(V_{p^*}[t, g-1] - V_{p^*}[t, g + 1])

Optimal player

\displaystyle p^*(t,0) = \frac{1}{2}

Optimal regret ($$V^* = V_{p^*}$$)

\displaystyle V^*[t,g] = \frac{1}{2}(V^*[t+1, g-1] + V^*[t+1, g + 1])

For $$g = 0$$

\displaystyle V^*[t,0] = \frac{1}{2} + V^*[t+1, 1]

For $$g > 0$$

For $$g = 0$$

### Discrete Derivatives

p^*(t,g) = \frac{1}{2} \big( V^*[t, g-1] - V^*[t, g+1]\big)
\approx \partial_g V^*[t,g]
\eqqcolon V_g^*[t,g]
V^*[t, g] - V^*[t-1, g]
= \frac{1}{2}( V^*[t, g-1] - V^*[t, g]) - \frac{1}{2}(V^*[t, g] - V^*[t,g+1])
\coloneqq V_t^*[t,g]
\coloneqq \frac{1}{2}V_{gg}^*[t,g]
\partial_t V^*[t,g] \approx
\approx \frac{1}{2}\partial_g V^*[t,g-1]
\approx \frac{1}{2} \partial_g V^*[t,g]
\approx \frac{1}{2} \partial_{gg} V^*[t,g]

### Bounding the Discretization Error

Main idea

$$R$$ satisfies the continuous BHE

\implies
R_t(t,g) + \frac{1}{2} R_{gg}(t,g) \approx

Approximation error of the derivatives

\implies
\leq 0.74
\displaystyle \mathrm{Regret(T)} \leq \frac{1}{2} + \sqrt{\frac{T}{2\pi}} + \sum_{t = 1}^T O\Bigg( \frac{1}{(T - t)^{3/2}}\Bigg)

Lemma

\partial_{gg}R(t,g) - R_{gg}(t,g)
\partial_t R(t,g) - R_t(t,g)
\in O\Big( \frac{1}{(T - t)^{3/2}}\Big)
\Bigg\}

### Known and New Results

Multiplicative Weights Update method:

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2} \ln n}

Optimal for $$n,T \to \infty$$ !

If $$n$$ is fixed, we can do better

Worst-case regret for 2 experts

Player knows $$T$$ (fixed-time)

Player doesn't know $$T$$ (anytime)

Question:

Is there an efficient algorithm for the fixed-time case?

Ideally an algorithm that works for general costs!

$$O(T)$$ time per round

Dynamic Programming

$$\{0,1\}$$ costs

$$O(1)$$ time per round

Stochastic Calculus

$$[0,1]$$ costs

[Harvey, Liaw, Perkins, Randhawa FOCS 2020]

\displaystyle \approx 0.65 \sqrt{T}
\displaystyle \sqrt{\frac{T}{2\pi}} + O(1)

[Cover '67]

### Our Results

Result:

An Efficient and Optimal Algorithm in Fixed-Time with Two Experts

$$O(1)$$ time per round

was $$O(T)$$ before

\displaystyle \mathrm{Regret}(T) \leq \sqrt{\frac{T}{2 \pi}} + 1.3

Holds for general costs!

Technique:

Discretize a solution to a stochastic calculus problem

[HLPR '20]

How to exploit the knowledge of $$T$$?

Non-zero discretization error!

Insight:

Cover's algorithm has connections to stochastic calculus!

This connection seems to extend to more experts and other problems in online learning in general!

#### IAM Student seminar - OL via the Expert's problem

By Victor Sanches Portella

• 137