## Online Convex Optimization

Learning, Duality, and Algorithms

Victor Sanches Portella

Advisor: Marcel K. de Carli Silva

IME - USP

May, 2019

# Online Convex Optimization

### Online Convex Optimization (OCO)

At each round

Player chooses a point

Enemy chooses a function

Player suffers a loss

SIMULTANEOUSLY

Player

Enemy

!

!

x \in X
f
f(x)

CONVEX

Player and Enemy see

f~\text{and}~x

### Formalizing Online Convex Optimization

An Online Convex Optimization Problem

\mathcal{C} = (X, \mathcal{F})
X

convex set

\mathcal{F}

set of convex functions

Player

Enemy

### Rounds

t = 1, \dotsc, T
x_t \in X
f_t \in \mathcal{F}

### Expert's Problem

Player

Enemy

Experts

0.5

0.1

0.3

0.1

1

0

-1

1

f(p) = y^{T}p = \mathbb{E}_{e \sim p}[y_e]

Probabilities

Costs

p \in \Delta_E
y \in [-1,1]^E

### Online Regression

Online Linear Regression

Player

Enemy

r_t
(x_t, y_t)
|r_t(x_t) - y_t|
r_t(x) = \langle w_t,x \rangle

Regression Function

Loss

w_t
f_t(w) = |\langle w, x_t \rangle - y_t |

### Regret

\mathrm{Regret}_T( u) = \displaystyle \sum_{t = 1}^T f_t(x_t) - \sum_{t = 1}^T f_t(u)
\mathrm{Regret}_T( U) = \displaystyle \sup_{u \in U} \mathrm{Regret}_T( u)

Cost of always choosing

u

## Goal:sublinearRegret

\displaystyle \lim_{T \to \infty} \frac{1}{T}\mathrm{Regret}_T( U) = 0

Player's Loss

### Player Strategies

Sublinear regret under mild conditions

Focus of this talk: algorithms for the Player

Hupefully efficiently implementable

\text{FTRL}
\text{EOMD}
\text{LOMD}

# FTRL

### Experts

0
1
0.5
1
t = 1
1
1.5
0.5
1
t = 2
1.5
2
1
1.5
t = 3
2.5
3
2
1.5
t = 4

Enemy

Player

f_1
f_2
f_3
f_4
x_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t f_{i}(x)

# UNSTABLE!

x_1
x_2
x_3
x_4
{}_{x \in X}

Enemy

Player

f_1
x_1
f_2
x_2
f_3
x_3
f_4
x_4
x_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t f_{i}(x) + R(x)
R

## FTRL

Fixed Regularizer

{}_{x \in \mathbb{E}}

At round     use regularizer

R_t
t
x_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t f_{i}(x) + R_{t+1}(x)
R_t
R_{t+1}

### ?

r_{t+1}
R_{t+1} = R_t + r_{t+1}
R_{t+1} = r_1 + r_2 + \dotsc + r_{t+1}

Regularizer Increment

Convex Function

{}_{x \in \mathbb{E}}

x_{t+1} = \displaystyle \mathrm{arg}\,\mathrm{min} \sum_{i = 1}^t f_{i}(x) + R_{t+1}(x)
\displaystyle \sum_{i = 1}^{t+1} r_{i}(x)

### Efficiently computable?

Not clear in general

{}_{x \in \mathbb{E}}

# Online Mirror Descent

- \nabla f_t(x_t)
X
x_t
x_{t+1}

### Round

t
x_{t+1} = \mathrm{Proj}_X(x_t - \nabla f_t(x_t))

projection

### Another Perspective

\nabla f_t(x_t)

Representation of  derivative

[Df_t(x_t)](~\;) = \langle \nabla f_t(x_t), ~~\; \rangle

What is

?

direction

u
u
u

x_t - \nabla f_t(x_t)
x_t - Df_t(x_t)

point

\langle x_t, \cdot \rangle - Df_t(x_t)

functional

(Riesz Repr. Theorem)

functional

functional

Directional derivative of      at

f_t
x_t

### Avoiding Inner-Product

\langle x_t, \cdot \rangle - Df_t(x_t) = D R (x_t) - Df_t(x_t)
R(x) = \frac{1}{2} \lang x, x\rang
\implies
\nabla R(x) = x
x_t - \eta \nabla f_t(x_t) = \nabla R (x_t) - \nabla f_t(x_t)

### What if we make other choices for         ?

R(x)
\frac{1}{2}\lVert x\rVert_2^2

### What if we make other choices for         ?

R(x)
R(x)
(i)

strictly convex and differentiable on

\mathrm{int}(\mathrm{dom} R)
(ii)
y = \nabla R(~~)
(iii)

For every

y

there is

~~~ \in \mathrm{int}(\mathrm{dom} R)

such that

\Pi_X^R(y) \in \mathrm{int}(\mathrm{dom} R)

Bregman Projections onto       attained by

\mathrm{int}(\mathrm{dom} R)
\bar{y}
\bar{y}
\nabla R^{-1}(y) =
\bar{y}
\implies
X
\forall y \in \mathrm{int}(\mathrm{dom} R)
{}^*

Bregman Projector

### Online Mirror Descent

x_t
\nabla R(x_t)
- \nabla f_t(x_t)
y_{t+1}
x_{t+1}
\nabla R
\nabla R^*
\Pi_X^R

Bregman

Projection

### Primal

X
\mathrm{int}(\mathrm{dom} R)

{}_{t+1}
{}_{t+1}
{}_{t+1}
\nabla R_{~~~~~~}(x_t)
{}_{t+1}

First round

x_1 \in \mathrm{arg}\,\mathrm{min}~R_1(x)
x \in X

Round

t+1

for

t = 1, \dotsc, T
y_{t+1} = \nabla R_{t+1}(x_t) - \nabla f_t(x_t)
x_{t+1} = \Pi_X^{R_{t+1}} (\nabla R^*_{t+1}(y_{t+1}))
R_{t+1} = r_1 + \dotsc + r_t + r_{t+1}
R_{t+1} = R_t + r_{t+1}

Mirror Map Increments

### Lazy Online Mirror Descent

x_t
\nabla R(x_t)
- \nabla f_t(x_t)
y_{t+1}
x_{t+1}
\nabla R
\nabla R^*
\Pi_X^R

Bregman

Projection

X
\mathrm{int}(\mathrm{dom} R)
y_{t}

### Classic Online Mirror Descent

First round

x_1 \in \mathrm{arg}\,\mathrm{min}~R(x)
x \in X
x_{t+1} = \Pi_X^R (\nabla R^*(y_{t+1}))

First round

x_1 \in \mathrm{arg}\,\mathrm{min}~R(x)
x \in X

For

t = 1, \dotsc, T
y_{t+1} = ~~~~~~~ - \nabla f_t(x_t)
x_{t+1} = \Pi_X^R (\nabla R^*(y_{t+1}))

For

t = 1, \dotsc, T
y_t
y_{t+1} = ~~~~~~~~~~~~~ - \nabla f_t(x_t)
\nabla R(x_t)

### LOMD as FTRL

y_{t+1} = ~~~~~- \nabla f_t(x_t)
= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - \nabla f_t(x_t)
\displaystyle -\sum_{i = 1}^{t} \nabla f_i(x_i)

### ...

y_t
y_{t-1} -\nabla f_{t-1}(x_{t-1})
=
\displaystyle x_{t+1} = \mathrm{arg}\,\mathrm{min}~ \sum_{i=1}^t \langle \nabla f_i(x_i), x \rangle + R_X(x)
R_X =
\{
R

inside

X
+ \infty

outside

### FTRL

\nabla R_X^*(y_{t+1}) = \Pi_X^R(\nabla R^*(y_{t+1}))
{}_{x \in \mathbb{E}}

### EOMD as FTRL

y_{t+1} = ~~~~~~~~~~~~~~~- \nabla f_t(x_t)
= ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - \nabla f_t(x_t)
\nabla R(x_t)
y_{t-1} -\nabla f_{t-1}(x_{t-1})
R_X =
\{
R

inside

X
+ \infty

outside

\partial R_X(x_t) = \nabla R(x_t) + N_X(x_t)
\nabla R_X^*(y_{t+1}) = \Pi_X^R(\nabla R^*(y_{t+1}))
N_X(x_t)
X
x_t

### EOMD as FTRL

X
\nabla R(x_t)
R
\nabla R(x_t) + N_X(x_t)
x_t
N_X(x_t) = [0, +\infty)

### EOMD as FTRL

y_{t+1} = ~~~~~~~~~~~~~~~- \nabla f_t(x_t)
\nabla R(x_t)
\displaystyle x_{t+1} = \mathrm{arg}\,\mathrm{min}~ \sum_{i=1}^t \langle \nabla f_i(x_i) + p_i, x \rangle + R_X(x)
R_X =
\{
R

inside

X
+ \infty

outside

### FTRL

\partial R_X(x_t) = \nabla R(x_t) + N_X(x_t)
\nabla R_X^*(y_{t+1}) = \Pi_X^R(\nabla R^*(y_{t+1}))
p_1 \in N_X(x_1), p_2 \in N_X(x_2), \dotsc, p_t \in N_X(x_t)
{}_{x \in \mathbb{E}}

### EOMD vs LOMD

\displaystyle x_{t+1} = \mathrm{arg}\,\mathrm{min}~ \sum_{i=1}^t \langle \nabla f_i(x_i), x \rangle + R_X(x)

### Eager = Lazy

\displaystyle x_{t+1} = \mathrm{arg}\,\mathrm{min}~ \sum_{i=1}^t \langle \nabla f_i(x_i) + ~~~~ , x \rangle + R_X(x)
p_i
N_X(z_1)
X
z_1
z_2
N_X(z_2) = \{0\}
x_i \in \mathrm{int}(\mathrm{dom~R})
\mathrm{int}(\mathrm{dom R}) \subseteq \mathrm{ri}~X
\implies
p_i
{}_{x \in \mathbb{E}}
{}_{x \in \mathbb{E}}

# Algorithms

### Connection Among the Main Algorithms

y_{t+1} = x_t - ~~~~~~~ \nabla f_t(x_t)
x_{t+1} = \mathrm{Proj}_{~~~~~~~~}(y_{t+1}^{})
H_{t+1}
{}_{H_{t+1}^{-1}}
\displaystyle H_{t+1} \approx G_t^{-1}
\displaystyle H_{t+1} \approx G_t^{-\frac{1}{2}}

### Second Order Algorithms?

\displaystyle G_{t} = \sum_{i = 1}^t \nabla f_i(x_i) \nabla f_i(x_i)^{\intercal}

# Future Directions

### Generalizations and Special Cases

Limited Feedback: Bandit, two-point Bandit feedback

Special Cases: Combinatorial, other specific settings

Player

Hypercube

L2-Ball

Change Metric: Policy Regret, Raw Loss

side information

### OCO in Other Areas

Quantum Computing

Approximately Maximum Flow

Robust Optimization

Competitive Analysis

Spectral Sparsification

SDP Solver

Oracle Boosting

Ideas

New Setting

Variational Perspective

## Online Convex Optimization

Learning, Duality, and Algorithms

Victor Sanches Portella

Advisor: Marcel K. de Carli Silva

IME - USP

May, 2019

#### OCO - Defense

By Victor Sanches Portella

• 593