PhD Oral Presentation
Victor Sanches Portella
July 30, 2024
Stochastic Calc.
& Online Learning
Vector Calculus and Probability Theory
Stochastic Calculus
Output 1
Output 2
Not too far apart
What does it mean for an algorithm \(\mathcal{M}\) to be private?
Differential Privacy: \(\mathcal{M}\) does not rely heavily on any individual
\((\varepsilon, \delta)\)-Diff. Privacy
\(\varepsilon \equiv \) "Privacy leakage", small constant
\(\delta \equiv \) "Chance of failure", usually \(O(1/\text{\#samples})\)
Unknown Covariance Matrix
\((\varepsilon, \delta)\)-differentially private \(\mathcal{M}\) to estimate \(\Sigma\)
on \(\mathbb{R}^d\)
Goal:
Required even without privacy
Required even for \(d = 1\)
Is this tight?
Exists \((\varepsilon, \delta)\)-DP \(\mathcal{M}\) such that
samples
Known algorithmic results
with
Theorem
For any \((\varepsilon, \delta)\)-DP algorithm \(\mathcal{M}\) such that
and
we have
Our results generalize both of them
Nearly highest reasonable value
[Kamath et al. 22]
Previous lower bounds required
[Narayanan 23]
OR
A measure of the correlation between \(z\) and \(\mathcal{M}(X)\)
Correlation statistic
If \(z \sim \mathcal{N}(0, \Sigma)\) indep. of \(X\)
small
If \(\mathcal{M}\) is accurate
large
Fingerprinting Lemma
Approx. equal by privacy
Approx. equal by privacy
Score function
Score Attack Statistic
To get a fingerprinting lemma, we need to randomize \(\Sigma\) so that
is large
Previous work
\(\Sigma\) with "small radius" \(\implies\) Weak FP Lemma
\(\Sigma\) with "large radius" \(\implies\) hard to bound \(\mathbb{E}[|\mathcal{A}(z, \mathcal{M}(X))|]\)
Fingerprinting Lemma
Need to Lower Bound
\(\Sigma \sim\) Wishart leads to elegant analysis
Stein-Haff Identity
"Move the derivative" from \(g\) to \(p\) with integration by parts
Stokes' Theorem
FP Lemma
Upper Bound
Player
Adversary
\(n\) Experts
0.5
0.1
0.3
0.1
Probabilities
1
-1
0.5
-0.3
Gains
Player's gain:
Gain of Best Expert
Player's Gain
Best Expert
Best Experts
\(\varepsilon\)-fraction
Multiplicative Weights Update:
Needs knowledge of \(\varepsilon\)
We design an algorith with \(\sqrt{T \ln(1/\varepsilon)}\) quantile regret
for all \(\varepsilon\) and best known leading constant
Gain of
top \(\varepsilon n \) expert
\(\varepsilon\)-Quantile Regret
Analysis often becomes clean
Sandbox for design of optimization algorithms
Key Question: How to model non-smooth (online) optimization in continuous time?
Why go to ?
continuous time
Discrete Time
Useful perspective: \(G(i)\) is a realization of a random walk
Continuous Time
\(G_t(i)\) is a realization of a Brownian Motion
Worst-case =
Probability 1
Potential based players
Stochastic Calculus suggests \(\Phi\) that satisfy the Backwards Heat Equation
For all \(\varepsilon\)
Using this potential*, we get
Best leading constant
Discrete time analysis is IDENTICAL to continuous time analysis
*(with a slightly bigger cnst. in the BHE)
Question:
Are the minimax regret with and without knowledge of \(T\) different?
fixed-time
anytime
anytime
fixed-time
Theorem: In Continuous Time, both are equal if Brownian Motions are independent.
MinmaxRegret
Can we get better lower bounds?
=
?
Player's Total Gain
Vector of the Experts' Gains
High expected regret \(\implies\) anytime lower bound
Max expected anytime regret without independent experts?
Anytime Regret \(\equiv\) \(\tau\) is a stopping time
How big can
be?
For a martingale \((G_t)_{t \geq 0}\), find upper and lower bounds to
sup
is a stopping time
No assumptions
on the dependency between coordinates
Theorem
If \(G_t(i)\) is a Brownian motion for all \(i = 1, \dotsc, n\), then
Evidence that Anytime Lower Bounds for
continuous experts needs new techniques
For a martingale \((G_t)_{t \geq 0}\), find upper and lower bounds to
sup
is a stopping time
Similar upper bounds when \(G_t(i)\) has smooth quadratic variation
If \(G_t(i)\) is a discrete martingale with increments in \([-1,1]\), we have
Beyond Brownian Motion
Discrete Martingales
Discrete Ito's Lemma
Goal:
non-smooth \(\implies\) hard to show it is a supermartingale
Idea:
Design a smooth function \(\Phi\) such that
\((\Phi(t, G_t))_{t \geq 0}\)
is a supermartingale
and
Backwards Heat Eq.
Tune Constants
Tight bounds on expected norm for a large family of martingales. Nearly tight bounds and implications for the experts problem.
Continuous-time model for the experts' problem and new algorithms. Sandbox for online learning algorithms.
New lower-bounds for private covariance estimation. Techniques suggest a proof strategy for lower bounds in DP.
PhD Oral Presentation
Victor Sanches Portella
July 30, 2024
Score function
If \(z \sim \mathcal{N}(0, \Sigma)\) indep. of \(X\)
For \(x_1, \dotsc, x_n\) from \(X\)
\(= \Theta(d^2)\) if \(\mathbb{E}[\mathcal{M(X)}] = \Sigma\)
Total gains of expert \(i\):
Useful perspective: \(G(i)\) is a realization of a random walk
realization of a Brownian Motion
Probability 1 = Worst-case
Discrete Time
Continuous Time
Main Property
Score function
Score Attack Statistic
Previous work:
Want random \(\Sigma\) such that this is large in expectation
Known bounds on sample complexity
There is an \((\varepsilon, \delta)\)-DP mechanism \(\mathcal{M}\) such that
Unknown Covarince Matrix
on \(\mathbb{R}^d\)
for some
Required even without privacy
Required even for \(d = 1\)
Is this tight?
Correlation statistic
If \(z \sim \mathcal{N}(0, \Sigma)\) indep. of \(X\)
small
If \(\mathcal{M}\) is accurate
large
Approx. equal by privacy
Approx. equal by privacy
Fingerprinting Lemma
for covariance estimation
\(\mathcal{A}(z, \mathcal{M}(X))\)
leads to limited lower bounds
Correlation statistic
If \(z \sim \mathcal{N}(0, \Sigma)\) indep. of \(X\)
small
If \(\mathcal{M}\) is accurate
large
Approx. equal by privacy
Approx. equal by privacy
Fingerprinting Lemma
for covariance estimation
\(\mathcal{A}(z, \mathcal{M}(X))\)
leads to limited lower bounds
Player
Adversary
\(n\) Experts
0.5
0.1
0.3
0.1
Probabilities
1
-1
0.5
-0.3
Costs
Player's loss:
Adversary knows the strategy of the player
Loss of Best Expert
Player's Loss
Optimal!
For random \(\pm 1\) costs
Multiplicative Weights Update:
(Hedge)
Boosting in ML
Understanding sequential prediction
Universal Optimization
Solving SDPs, TCS, Learning theory...
MWU regret
when \(T\) is known
when \(T\) is not known
anytime
fixed-time
Does knowing \(T\) gives the player an advantage?
[Harvey, Liaw, Perkins, Randhawa '23]
Random cost are (probably) too easy to show separation
[VSP, Liaw, Harvey '22]
Anytime > Fixed time for 2 experts + optimal algorithm
+ new algorithms for quantile regret!
With stochastic calculus:
Analysis often becomes clean
Sandbox for design of optimization algorithms
Gradient flow is useful for smooth optimization
How to model non-smooth (adversarial) optimization in continuous time?
Why go to ?
continuous time
Total loss of expert \(i\):
Useful perspective: \(L(i)\) is a realization of a random walk
realization of a Brownian Motion
Probability 1 = Worst-case
Discrete Time
Continuous Time
Discrete time
Continuous time
Cummulative loss
Player's cummulative loss
Player's loss per round
[Freund '09]
Potential based players
Regret bounds
when \(T\) is known
when \(T\) is not known
anytime
fixed-time
MWU!
Same as discrete time!
Idea: Use stochastic calculus to guide the algorithm design
with prob. 1
Ito's Lemma
(Fundamental Theorem of Stochastic Calculus)
\(B_t\) is very non-smooth \(\implies\) second-order terms matter
\(n = 1\)
Ito's Lemma
Potential does not change too much
Would be great
Potential based players
Matches fixed-time!
Stochastic calculus suggests pontential that satisfy the Backwards Heat Equation
This new anytime algorithm has good regret!
Does not translate easily to discrete time
need to add correlation between experts
Take away: independent experts cannot give better lower-bounds (in continuous-time)
Discrete time analysis is IDENTICAL to continuous time analysis
Improved anytime algorithms with bounds
quantile regret
Discrete Ito's
Lemma
Based on work by Zhang, Yang, Cutkosky, Paschalidis
Player
Adversary
Unconstrained
Linear functions
Player's loss:
Loss of Fixed \(u\)
Player's Loss
Goal:
Parameter-Free = No knowledge of \(\lVert u \rVert\)
Even better:
"Adaptive" = Adapts to gradient norm
Discrete Regret
Continuos Regret
Theorem:
If \(\Phi\) satisfies the BHE and
Going to higher dim:
Continuous time analogue
of
Learn direction and scale separately
Use refined discretization
Discretizing:
Continuous Time Model for Experts and OLO
Thanks!
Main References:
[VSP, Liaw, Harvey '22] Continuous prediction with experts' advice.
[Zhang, Yang, Cutkosky, Paschalidis '24] Improving adaptive online learning using refined discretization.
[Freund '09] A method for hedging in continuous time.
[Harvey, Liaw, Perkins, Randhawa '23] Optimal anytime regret with two experts.
How to discretize the algorithm for indep. experts?
?
Improve LB for anytime experts?
?
High-dim continuous time OLO?
?