Research Proficiency Exam
Victor Sanches Portella
PhD Student in Computer Science - UBC
October, 2020
Player
Adversary
\(n\) Experts
0.5
0.1
0.3
0.1
1
0
0.5
0.3
Probabilities
Costs
Player's loss:
Goal: sublinear regret in the worst-case
Multiplicative Weights Update method:
Optimal for \(n,T \to \infty\) !
If \(n\) is fixed, we can do better
\(n = 2\)
\(n = 3\)
\(n = 4\)
Player knows \(T\) !
Minmax regret in some cases:
What if \(T\) is not known?
Minmax regret
\(n = 2\)
[Harvey, Liaw, Perkins, Randhawa FOCS 2020]
They give an efficient algorithm!
Multiplicative Weights Update method:
Optimal for \(n,T \to \infty\) !
If \(n\) is fixed, we can do better
Minmax regret for 2 experts:
[Harvey, Liaw, Perkins, Randhawa FOCS 2020]
\(O(1)\) time per round
[Cover '67]
Player knows \(T\) (fixed-time)
Player doesn't know \(T\) (anytime)
\(O(T)\) time per round
Dynamic Programming
Stochastic Calculus
[Greenstreet]
Our results:
A complete theoretical analysis of the fixed-time algorithm
Player knows \(T\) (fixed-time)
Player doesn't know \(T\) (anytime)
\(O(1)\) time per round
\(O(T)\) time per round
Stochastic calculus and discretization techniques
Dynamic programming
[Greenstreet]
\(O(1)\) time per round
[Harvey et al. '20]
minmax regret
minmax regret
[Cover '67]
Our results:
An efficient and optimal algorithm for two experts
regret
We will consider only 0 or 1 costs (no fractional costs!) Enough for the worst case
1
0
0
1
0
0
1
1
Equal costs are a "waste of time", so we do not consider those
Cover's algorithm strongly relies on these assumptions
Thought experiment: how much probability mass to put on each expert?
Cumulative Loss on round \(t\)
\(\frac{1}{2}\) is both cases seems reasonable!
Takeaway: player's decision could depend on the gap between experts
Gap = |42 - 20| = 22
Lagging Expert
Leading Expert
42
20
2
2
42
42
Path-independent player:
Choice on round \(t\) depends only on the gap \(g_{t-1}\) of round \(t-1\)
Choice doesn't depend on the specific past costs
Path-independent player \(\implies\)
\(V_p[t,g]\) depends only on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)
Maximum regret of \(p\)
on the Lagging expert
on the Leading expert
We can compute \(V_p\) backwards in time!
We then choose \(p^*\) that minimizes \(V^*[0,0] = V_{p^*}[0,0]\)
Maximum regret-to-be-suffered on rounds \(t+1, \dotsc, T\) if gap at round \(t\) is \(g\)
Path-independent player:
If
round \(t\) and gap \(g_{t-1}\) on round \(t-1\)
on the Lagging expert
on the Leading expert
Choice doesn't depend on the specific past costs
for all \(t\), then
gap on round \(t\)
A discrete analogue of a Riemann-Stieltjes integral
A formula for the regret
Maximum regret-to-be-suffered on rounds \(t+1, \dotsc, T\) when gap on round \(t\) is \(g\)
Path-independent player \(\implies\) \(V_p[t,g]\) depends only on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)
Regret suffered on round \(t+1\)
Regret suffered on round \(t + 1\)
Maximum regret-to-be-suffered on rounds \(t+1, \dotsc, T\) if gap at round \(t\) is \(g\)
We can compute \(V_p\) backwards in time!
Path-independent player \(\implies\)
\(V_p[t,g]\) depends only on \(\ell_{t+1}, \dotsc, \ell_T\) and \(g_t, \dotsc, g_{T}\)
We then choose \(p^*\) that minimizes \(V^*[0,0] = V_{p^*}[0,0]\)
Maximum regret of \(p\)
For \(g > 0\)
Optimal player
Optimal regret (\(V^* = V_{p^*}\))
For \(g = 0\)
For \(g > 0\)
For \(g = 0\)
Optimal regret (\(V^* = V_{p^*}\))
For \(g > 0\)
For \(g = 0\)
Maximum regret of \(p^*\)
Theorem
Expected # of 0's of a Sym. Random Walk of Length \(T\)
Theorem
For any player, if the gaps are random and distributed like a reflected symmetric random walk,
Expected # of 0's of a SRW of Length \(T - 1\)
Formula for the regret based on the gaps
Discrete stochastic integral of \(p\) with respect to the reflected RW \(g\)
Moving to continuous time:
Random walk \(\longrightarrow\) Brownian Motion
Insight:
Regret bound \(\equiv\) almost sure bound on the integral
Gaps are on the support of a reflected random walk
Formula for the regret based on the gaps
Random walk \(\longrightarrow\) Brownian Motion
Reflected Brownian motion
Conditions on the continuous player \(p\)
Continuous on \([0,T) \times \mathbb{R}\)
for all \(t \geq 0\)
How to work with stochastic integrals?
Itô's Formula:
Different from classic FTC!
\(\overset{*}{\Delta} f(t, g) = 0\) everywhere
ContRegret doesn't depend on the path of \(B_t\)
Backwards Heat Equation
Goal:
Find a "potential function" \(R\) such that
(1) \(p = \partial_g R\) is a valid continuous player
(2) \(R\) satisfies the Backwards Heat Equation
For Cover's algorithm, we can show
Lagging expert finishes leading
Gaps ~ Reflected RW
Law of Large Numbers:
Itô's Formula \(\implies\)
\(Q\) satisfies BHE
\(R(t,g)\) such that
Calculus trick
\(R\) satisfies BHE
\(\partial_g R = Q\)
\(R(t,g) \leq \sqrt{T/2\pi}\)
But we wanted a potential R satisfying BHE
\(V^*\) satisfies the "discrete" Backwards Heat Equation!
Discretizatized player:
Bound regret with a discrete analogue of Itô's Formula
Hopefully, \(R\) satisfies the discrete BHE
In the work of Harvey et al., they had
In this fixed-time solution, we are not as lucky
Negative discretization error!
Main idea
\(R\) satisfies the continuous BHE
Approximation error of the derivatives
Lemma
Research Proficiency Exam
Victor Sanches Portella
PhD Student in Computer Science - UBC
October, 2020