Online Convex Optimization
Learning, Duality, and Algorithms
Victor Sanches Portella
Supervisor: Marcel K. de Carli Silva
IME - USP
August, 2018
Learning
Motivation
Yes
No
Spam?
Adaptive
Online Learning (OL)
At each round
Nature reveals a query
Player makes a prediction
Enemy picks the "true answer"
Player suffers a loss
SIMULTANEOUSLY
Player
Enemy
?
Nature
!
!
Player and Enemy see each other's choices
Player
Examples of OL
Spam Filtering
Nature reveals emails
Player is the spam filter
Enemy is the user
Player
Enemy
?
Nature
!
!
Loss of 1 if the filter is wrong
Examples of OL
Prediction with Expert Advice
Nature reveals the experts' advice
Player chooses an expert
Enemy chooses the costs of the experts
Player
Enemy
?
Nature
!
!
Experts
Loss is the cost of the chosen expert
Statistical Learning and OL Comparison
Probability distribution over enemy and nature
Training set
Statistical Learning
Online Learning
Adversarial enemy and nature
Online
Expected accuracy
Cumulative loss
Formalizing Online Learning
An Online Learning Problem
query set
label set
decision set
loss function
Spam
Emails
Yes/No
Yes/No
Binary
Experts
Expert's cost
Minimizing the Loss
Minimizing the cost is impossible
X
Player
Enemy
Not X
Idea: minimize the
regret
hypothesis
# of Rounds
Minimizing the Loss
Attaining sublinear regret is impossible in general (Cover '67)
Idea: allow the player to randomize his choices
Enemy does not know the outcomes of the "coin flips"
Bounds on regret with high probability or on the expectation
Player
Enemy
Simulated Player
Probability distribution
Online Convex Optimization (OCO)
At each round
Player chooses a point
Enemy chooses a function
Player suffers a loss
SIMULTANEOUSLY
Player
Enemy
!
!
CONVEX
Player and Enemy see
Formalizing Online Convex Optimization
An Online Convex Optimization Problem
convex set
set of convex functions
Cost of always choosing
OCO and OL Relations
Online Convex Optimization
Online Learning
Special Case of OL
Low-regret
Algorithms
From OL to OCO
An OL problem: Experts
Player
Enemy
Nature
OL
OCO
Experts' advice
From OL to OCO
An OL problem: Experts
Player
Enemy
Nature
OL
Randomized
OCO
Player
Enemy
Simplex
Experts' advice
Duality
Function and Epigraph
Conjugate function
Subgradients
Conjugate Functions and Subgradients
Strongly convex and Strongly smooth Functions
Strongly convex
Strongly smooth
ARBITRARY
Strongly convex/Strongly smooth Duality
Theorem
Dual norm
Bregman Divergence and Projection
Bregman Divergence
Bregman Projection
1st-order Taylor
Algorithms
Follow the Regularized Leader
Already-seen functions
Strongly convex regularizer
Bounding the Regret of FTRL
Lemma (Kalai e Vempala, '05):
Corollary:
Bounding the Regret - Lipschitz Continuity
Stability between rounds
?
Lemma:
Bounding the Regret
Stability between rounds
Lipschitz
?
Bounding the Regret - Using Duality
(McMahan, '17)
Bounding the Regret
Theorem (Abernethy, B, R, T, '08):
Diameter
Bounding the Regret - Experts Example
Randomized Experts
Best expert in hindsight
Bounding the Regret - Experts Example
FTRL is not Perfect
Each FTRL step needs to solve a optimization problem
It would be interesting to have an algorithm which is clearly efficiently computable
Online Mirror Descent - Intuition
Online Mirror Descent
Already-seen functions
Strongly Convex Regularizer
Eager
OMD Examples
Online
(Sub)Gradient Descent
Hedge
Connections Among Algorithms
FTRL
LOMD
EOMD
Online Newton
Hedge
Newton
Online GD
Gradient
Mirror
AdaGrad
FTPL
Linear Coupling
Adaptive FTRL
Proximal
Adaptive Prox-FTRL
Quasi Newton
?
Adaptive OMD
Accelerated GD
Future Directions
Player
Enemy
!
!
Sugestion - Bandit Convex Optimization (BCO)
At each round
Player chooses a point
Enemy chooses a function
Player suffers a loss
SIMULTANEOUSLY
CONVEX
Player and Enemy see
Multi-armed Bandit Problem
"Slot machine" icons by Freepik at www.flaticon.com
At each round
Player chooses a machine
Enemy chooses the costs
Player suffers the loss of the machine he has chosen
$$
$
$$$
$
EXPERTS
EXPLORATION VS EXPLOITATION
History of BCO - Regret Bounds
Linear functions
Dimension
rounds
(Dani, H, K, '12 )
(Bubeck, C-B, K, '12 )
General case
(Dani, H, K, '12 )
(Bubeck, E, L, '17 )
(Bubeck, E, L, '17 )
Sugestion - Boosting
Set of low accuracy learners - Weak Learners
Combination generates a good model - Strong Learner
Usually performed in an incremental fashion
Idea: Use boosting outside of learning
Example - Approximately maximum flows
Electrical flows can be computed quickly
Nearly-linear time Laplacian solver (Spielman and Teng, '04)
Electrical flows may not respect capacities
Idea: Compute many electrical flows, penalizing violated edges
Multiplicative Weights Update Method
Otimização Convexa Online
Algoritmos, Aprendizado e Dualidade
Victor Sanches Portella
Orientador: Marcel K. de Carli Silva
IME - USP
Agosto, 2018
Formalizando Online Learning
Oráculos
Formalizando Online Learning
Algoritmo
Indo de OL para OCO
Um problema de OL: Regressão linear
conjunto das funções lineares
Jogador
Inimigo
Natureza
OL
OCO
Jogador
Inimigo
Teorema da separação
Teorema: existe
Limitando o regret - Usando dualidade
Estabilidade entre rodadas
Lipschitz
?
Lema:
Exemplos de LOMD
Exemplos de LOMD
Lema
Minimizing the Loss
Attaining sublinear regret is impossible in general (Cover '67)
X
Player
Enemy
Not X
Idea: allow the player to randomize her choices
No
Yes
Enemy does not know the outcomes of the "coin flips"
We want bounds with high probability or on the expectation
Algorithms for OCO
There are algorithms OCO with guaranteed sublinear regret
Take inspiration from classic optimization
Use concepts of conjugate functions and subgradients
Intuition depends on concepts from convex analysis
Conjugate Functions and Subgradients
Theorem. The following are equivalent:
atinge
atinge
(a)
(b)
(c)
(d)
Formalizing Online Convex Optimization
Algorithm
Cost of always choosing
Formalizing Online Convex Optimization
Oracles of Online Convex Optimization
An Online Convex Optimization Problem
convex set
set of convex functions
Online Mirror Descent - Intuition
Online Mirror Descent
Already-seen functions
Regularizer
Lazy
OCO - quali
By Victor Sanches Portella
OCO - quali
- 397