Learning, Duality, and Algorithms
Victor Sanches Portella
Supervisor: Marcel K. de Carli Silva
IME - USP
August, 2018
At each round
Nature reveals a query
Player makes a prediction
Enemy picks the "true answer"
Player suffers a loss
SIMULTANEOUSLY
Player
Enemy
?
Nature
!
!
Player and Enemy see each other's choices
Player
Spam Filtering
Nature reveals emails
Player is the spam filter
Enemy is the user
Player
Enemy
?
Nature
!
!
Loss of 1 if the filter is wrong
Prediction with Expert Advice
Nature reveals the experts' advice
Player chooses an expert
Enemy chooses the costs of the experts
Player
Enemy
?
Nature
!
!
Experts
Loss is the cost of the chosen expert
Probability distribution over enemy and nature
Training set
Adversarial enemy and nature
Online
Expected accuracy
Cumulative loss
An Online Learning Problem
query set
label set
decision set
loss function
Spam
Emails
Yes/No
Yes/No
Binary
Experts
Expert's cost
Minimizing the cost is impossible
X
Player
Enemy
Not X
Idea: minimize the
regret
hypothesis
# of Rounds
Attaining sublinear regret is impossible in general (Cover '67)
Idea: allow the player to randomize his choices
Enemy does not know the outcomes of the "coin flips"
Bounds on regret with high probability or on the expectation
Player
Enemy
Simulated Player
Probability distribution
At each round
Player chooses a point
Enemy chooses a function
Player suffers a loss
SIMULTANEOUSLY
Player
Enemy
!
!
CONVEX
Player and Enemy see
An Online Convex Optimization Problem
convex set
set of convex functions
Cost of always choosing
Special Case of OL
Low-regret
Algorithms
An OL problem: Experts
Player
Enemy
Nature
Experts' advice
An OL problem: Experts
Player
Enemy
Nature
Randomized
Player
Enemy
Simplex
Experts' advice
Strongly convex
Strongly smooth
ARBITRARY
Theorem
Dual norm
Bregman Divergence
Bregman Projection
1st-order Taylor
Already-seen functions
Strongly convex regularizer
Lemma (Kalai e Vempala, '05):
Corollary:
Stability between rounds
?
Lemma:
Stability between rounds
Lipschitz
?
(McMahan, '17)
Theorem (Abernethy, B, R, T, '08):
Diameter
Randomized Experts
Best expert in hindsight
Each FTRL step needs to solve a optimization problem
It would be interesting to have an algorithm which is clearly efficiently computable
Already-seen functions
Strongly Convex Regularizer
Online
(Sub)Gradient Descent
Hedge
FTRL
LOMD
EOMD
Online Newton
Hedge
Newton
Online GD
Gradient
Mirror
AdaGrad
FTPL
Linear Coupling
Adaptive FTRL
Proximal
Adaptive Prox-FTRL
Quasi Newton
?
Adaptive OMD
Accelerated GD
Player
Enemy
!
!
At each round
Player chooses a point
Enemy chooses a function
Player suffers a loss
SIMULTANEOUSLY
CONVEX
Player and Enemy see
"Slot machine" icons by Freepik at www.flaticon.com
At each round
Player chooses a machine
Enemy chooses the costs
Player suffers the loss of the machine he has chosen
$$
$
$$$
$
EXPERTS
EXPLORATION VS EXPLOITATION
(Dani, H, K, '12 )
(Bubeck, C-B, K, '12 )
(Dani, H, K, '12 )
(Bubeck, E, L, '17 )
(Bubeck, E, L, '17 )
Set of low accuracy learners - Weak Learners
Combination generates a good model - Strong Learner
Usually performed in an incremental fashion
Idea: Use boosting outside of learning
Electrical flows can be computed quickly
Nearly-linear time Laplacian solver (Spielman and Teng, '04)
Electrical flows may not respect capacities
Idea: Compute many electrical flows, penalizing violated edges
Multiplicative Weights Update Method
Algoritmos, Aprendizado e Dualidade
Victor Sanches Portella
Orientador: Marcel K. de Carli Silva
IME - USP
Agosto, 2018
Um problema de OL: Regressão linear
conjunto das funções lineares
Jogador
Inimigo
Natureza
Jogador
Inimigo
Teorema: existe
Estabilidade entre rodadas
Lipschitz
?
Lema:
Lema
Attaining sublinear regret is impossible in general (Cover '67)
X
Player
Enemy
Not X
Idea: allow the player to randomize her choices
No
Yes
Enemy does not know the outcomes of the "coin flips"
We want bounds with high probability or on the expectation
There are algorithms OCO with guaranteed sublinear regret
Take inspiration from classic optimization
Use concepts of conjugate functions and subgradients
Intuition depends on concepts from convex analysis
Theorem. The following are equivalent:
atinge
atinge
(a)
(b)
(c)
(d)
Cost of always choosing
Oracles of Online Convex Optimization
An Online Convex Optimization Problem
convex set
set of convex functions
Already-seen functions
Regularizer