Approximate Linear Programming for Markov Decision Processes

report is made by
Pavel Temirchev

 

Deep RL

reading group

 

Motivation

  • We want to model Interaction with Users
  • User state - is the Environment state
  • Our actions are adds, recommendations, etc.
  • Myopic predictions - current approach
  • We want to maximize return for Long-Term interactions
  • And we want to use pretrained myopic models (such as Logistic Regression)
  • State-Action space is very large, usually discrete and sparse!

Contents​

  • Background​
    • MDP, Factored MDP
    • Approximate Linear Programming
    • Logistic Regression
  • Logistic MDP
    • Factored Logistic MDP
  • ALP for Logistic MDP
    • Exact Sequential Approach
    • Piece-Wise Constant Approximation
    • Error Analysis
  • Experiments
  • Extensions

Some remarks

  • Model-Based method
  • We do not model Transition Dynamics, it is given from sky
  • Not really RL
  • Not really Deep
  • Work is in the progress

Background: MDP

Text

V^{\pi}(x) = r_x^a + \gamma \sum_{x'} p(x'|x, a) V^{\pi}(x')
Vπ(x)=rxa+γxp(xx,a)Vπ(x)V^{\pi}(x) = r_x^a + \gamma \sum_{x'} p(x'|x, a) V^{\pi}(x')
Q^{\pi}(x,a) = r_x^a + \gamma \sum_{x'} p(x'|x, a) V^{\pi}(x')
Qπ(x,a)=rxa+γxp(xx,a)Vπ(x)Q^{\pi}(x,a) = r_x^a + \gamma \sum_{x'} p(x'|x, a) V^{\pi}(x')

where \( a = \pi(x) \)

\pi^*(x) = \arg\max_a Q^*(x, a)
π(x)=argmaxaQ(x,a)\pi^*(x) = \arg\max_a Q^*(x, a)
p(x'|x, a)
p(xx,a)p(x'|x, a)

is transition probabilities

Background: MDP

x \in X
xXx \in X

- is a finite discrete state space

a \in A
aAa \in A

- is a finite discrete action space

x_i \in Dom(X_i)
xiDom(Xi)x_i \in Dom(X_i)

- is a finite discrete domain on each feature in \(x\)

- is a finite discrete domain on each feature in \(a\)

a_i \in Dom(A_i)
aiDom(Ai)a_i \in Dom(A_i)

x, a are then onehot encoded

Background:

Linear Programming task for MDP

\min_{v} \sum_x \alpha(x)v(x)
minvxα(x)v(x) \min_{v} \sum_x \alpha(x)v(x)

s.t.

x \in X
xXx \in X
v(x) \geq Q^v(x, a) = r_x^a + \gamma \sum p(x'|x,a)v(x')
v(x)Qv(x,a)=rxa+γp(xx,a)v(x)v(x) \geq Q^v(x, a) = r_x^a + \gamma \sum p(x'|x,a)v(x')
\forall x \in X, \forall a \in A
xX,aA\forall x \in X, \forall a \in A

 We have LP solvers.

Is this task tractable?

1) Too many variables to minimize

2) Too many summation terms

3) Too many terms in expectations

4) We even can't store transition p-s

5) Exponential number of constraints 

Background: Factored MDP

We need concise representation of the transition probabilities

p(x' | x, a) = \prod_i p(x_i'|x,a)
p(xx,a)=ip(xix,a)p(x' | x, a) = \prod_i p(x_i'|x,a)

Let:

And further:

p(x' | x, a) = \prod_i p(x_i'|par_i)
p(xx,a)=ip(xipari)p(x' | x, a) = \prod_i p(x_i'|par_i)
par_i = (x[Par_i], a[Par_i])
pari=(x[Pari],a[Pari])par_i = (x[Par_i], a[Par_i])
Par_i \subseteq X \cup A
PariXAPar_i \subseteq X \cup A

Let's use Dynamical Bayesian Network representation

4) We even can't store transition probabilities

Background:

Approximate Linear Programming

\min_{v} \sum_x \alpha(x)v(x)
minvxα(x)v(x) \min_{v} \sum_x \alpha(x)v(x)

1) Too many variables to minimize

2) Too many summation terms

Let

v(x) := \sum_{i=0}^k w_i\beta_i(x)
v(x):=i=0kwiβi(x)v(x) := \sum_{i=0}^k w_i\beta_i(x)

And let's denote

x[B_i] = b_i, B_i \subseteq X
x[Bi]=bi,BiXx[B_i] = b_i, B_i \subseteq X

So

v(x) := \sum_{i=0}^k w_i\beta_i(b_i)
v(x):=i=0kwiβi(bi)v(x) := \sum_{i=0}^k w_i\beta_i(b_i)

Where

\beta_i
βi\beta_i

- are some basis functions

Background:

Approximate Linear Programming

If we assume the same initial distribution factorization

\alpha(x) := \prod_{i=0}^k \alpha(b_i)
α(x):=i=0kα(bi)\alpha(x) := \prod_{i=0}^k \alpha(b_i)

We will get a new LP task:

\min_w \sum_{i=0}^k \sum_{b_i} \alpha(b_i)w_i \beta_i(b_i)
minwi=0kbiα(bi)wiβi(bi)\min_w \sum_{i=0}^k \sum_{b_i} \alpha(b_i)w_i \beta_i(b_i)

Background:

Approximate Linear Programming

PROOF:

\sum_x \alpha(x)v(x) = \sum_x \alpha(x)\sum_iw_i\beta_i(x[B_i])=
xα(x)v(x)=xα(x)iwiβi(x[Bi])=\sum_x \alpha(x)v(x) = \sum_x \alpha(x)\sum_iw_i\beta_i(x[B_i])=
= \sum_{b_0,...,b_k} \big[\prod_{j=0}^k \alpha(b_j) \big]\sum_iw_i\beta_i(b_i) =
=b0,...,bk[j=0kα(bj)]iwiβi(bi)== \sum_{b_0,...,b_k} \big[\prod_{j=0}^k \alpha(b_j) \big]\sum_iw_i\beta_i(b_i) =
=\sum_i \sum_{b_0,...,b_k} \big[\prod_{j=0}^k \alpha(b_j) \big]w_i\beta_i(b_i) =
=ib0,...,bk[j=0kα(bj)]wiβi(bi)==\sum_i \sum_{b_0,...,b_k} \big[\prod_{j=0}^k \alpha(b_j) \big]w_i\beta_i(b_i) =
=\sum_i \sum_{b_i} \alpha(b_i)\big[\sum_{b_j:j\neq i} \prod_{j\neq i} \alpha(b_j) \big]w_i\beta_i(b_i)
=ibiα(bi)[bj:jijiα(bj)]wiβi(bi)=\sum_i \sum_{b_i} \alpha(b_i)\big[\sum_{b_j:j\neq i} \prod_{j\neq i} \alpha(b_j) \big]w_i\beta_i(b_i)
=\sum_i \sum_{b_i} \alpha(b_i)w_i\beta_i(b_i)
=ibiα(bi)wiβi(bi)=\sum_i \sum_{b_i} \alpha(b_i)w_i\beta_i(b_i)

Background: ALP + Factored MDP

v(x) \geq Q^v(x, a) = r_x^a +\gamma \sum p(x'|x,a)v(x')
v(x)Qv(x,a)=rxa+γp(xx,a)v(x)v(x) \geq Q^v(x, a) = r_x^a +\gamma \sum p(x'|x,a)v(x')

3) Too many terms in expectations

Constraints for the LP problem:

\sum_{i=0}^k w_i \big[\gamma g_i - \beta_i(b_i)\big] + r_x^a \leq 0
i=0kwi[γgiβi(bi)]+rxa0\sum_{i=0}^k w_i \big[\gamma g_i - \beta_i(b_i)\big] + r_x^a \leq 0
g_i = \sum_{b_i'}\beta_i(b_i')p(b_i'|par_{B_i})
gi=biβi(bi)p(biparBi)g_i = \sum_{b_i'}\beta_i(b_i')p(b_i'|par_{B_i})
par_{B_i} = \cup_{i:X_i\in B_i} par_i
parBi=i:XiBiparipar_{B_i} = \cup_{i:X_i\in B_i} par_i

It may be rewriten as:

And even further decompose rewards as:

r_x^a = \sum_{j=0}^r \rho_j(x[R_j], a[R_j])
rxa=j=0rρj(x[Rj],a[Rj])r_x^a = \sum_{j=0}^r \rho_j(x[R_j], a[R_j])

Background: ALP + Factored MDP

v(x) \geq r_x^a +\gamma \sum p(x'|x,a)v(x')
v(x)rxa+γp(xx,a)v(x)v(x) \geq r_x^a +\gamma \sum p(x'|x,a)v(x')
\sum_{i=0}^k w_i \big[\gamma g_i(par_{B_i}) - \beta_i(b_i)\big] + r_x^a \leq 0
i=0kwi[γgi(parBi)βi(bi)]+rxa0\sum_{i=0}^k w_i \big[\gamma g_i(par_{B_i}) - \beta_i(b_i)\big] + r_x^a \leq 0
g_i = \sum_{b_i'}\beta_i(b_i')p(b_i'|par_{B_i})
gi=biβi(bi)p(biparBi)g_i = \sum_{b_i'}\beta_i(b_i')p(b_i'|par_{B_i})

PROOF:

\sum_iw_i\beta_i(b_i) \geq r_x^a +
iwiβi(bi)rxa+\sum_iw_i\beta_i(b_i) \geq r_x^a +
+ \gamma \sum_{b_0',...,b_k'} \prod_{j=0}^k p(b_j'|par_{B_j})\sum_iw_i \beta_i(b_i')
+γb0,...,bkj=0kp(bjparBj)iwiβi(bi)+ \gamma \sum_{b_0',...,b_k'} \prod_{j=0}^k p(b_j'|par_{B_j})\sum_iw_i \beta_i(b_i')
\sum_iw_i\beta_i(b_i) \geq r_x^a +
iwiβi(bi)rxa+\sum_iw_i\beta_i(b_i) \geq r_x^a +
+ \gamma \sum_i \big[\sum_{b_j':j\neq i} \prod_{j\neq i} p(b_j'|par_{B_j})\big] \sum_{b_i'} p(b_i'|par_{B_i}) w_i \beta_i(b_i')
+γi[bj:jijip(bjparBj)]bip(biparBi)wiβi(bi)+ \gamma \sum_i \big[\sum_{b_j':j\neq i} \prod_{j\neq i} p(b_j'|par_{B_j})\big] \sum_{b_i'} p(b_i'|par_{B_i}) w_i \beta_i(b_i')
\cdot
\cdot
\cdot
\cdot
\cdot
\cdot

Background: Constraint Generation

5) Exponential number of constraints 

  • Solve master LP problem for a subset of constraints using GLOP. Get optimal \( w \) values
  • Find a maximally violated constraint among ones, which was not added to master LP.
  • Add it to the master LP if violation is positive, else Break
  • Repeat
\max_{x, a} \sum_{i=0}^k w_i \big[\gamma g_i(par_{B_i}) - \beta_i(b_i)\big] + r_x^a
maxx,ai=0kwi[γgi(parBi)βi(bi)]+rxa\max_{x, a} \sum_{i=0}^k w_i \big[\gamma g_i(par_{B_i}) - \beta_i(b_i)\big] + r_x^a

MVC search:

Use a black-box solver SCIP

for Mixed Integer Programming

In our case \( x, a \) - are Boolean vectors

Background: Logistic Regression

Text

p(\phi = 1 | x, a) = \sigma (x^T u_x + a^T u_a)
p(ϕ=1x,a)=σ(xTux+aTua)p(\phi = 1 | x, a) = \sigma (x^T u_x + a^T u_a)
\sigma(x) = \frac{1}{1 + \exp(-x)}
σ(x)=11+exp(x)\sigma(x) = \frac{1}{1 + \exp(-x)}

Need more? Try Google, it's free

Logistic Regression

\phi
ϕ\phi
X_1
X1X_1
X_n
XnX_n
A_1
A1A_1
A_m
AmA_m

STATE

ACT

ION

RESPONSE

X_1
X1X_1
X_n
XnX_n
A_1
A1A_1
A_m
AmA_m
X_1
X1X_1
X_n
XnX_n
A_1
A1A_1
A_m
AmA_m

t

t+1

MDP

Logistic Markov Decision Processes

\phi
ϕ\phi
X_1
X1X_1
X_n
XnX_n
A_1
A1A_1
A_m
AmA_m
X_1
X1X_1
X_n
XnX_n
A_1
A1A_1
A_m
AmA_m

t

t+1

p(x^{t+1}| x^{t}, a^t) =
p(xt+1xt,at)=p(x^{t+1}| x^{t}, a^t) =
=\mathbb{E}_{p(\phi|x,a)} p(x^{t+1}| x^{t}, a^t,\phi^t)
=Ep(ϕx,a)p(xt+1xt,at,ϕt)=\mathbb{E}_{p(\phi|x,a)} p(x^{t+1}| x^{t}, a^t,\phi^t)

We allow response \( \phi^t \) to influence user's state at \( t+1 \)  timestep

Factored Logistic MDP

p(x^{t+1}|x^t, a^t) = \sum_{\phi\in\{0,1\}}\prod_ip(x^{t+1}_i|par_i,\phi)p(\phi|x^t,a^t)
p(xt+1xt,at)=ϕ{0,1}ip(xit+1pari,ϕ)p(ϕxt,at)p(x^{t+1}|x^t, a^t) = \sum_{\phi\in\{0,1\}}\prod_ip(x^{t+1}_i|par_i,\phi)p(\phi|x^t,a^t)

Transition Dynamics:

Reward function:

r^x_a(\phi) = \sum_{j=0}^r \rho_j(x[R_j],a[R_j],\phi)
rax(ϕ)=j=0rρj(x[Rj],a[Rj],ϕ)r^x_a(\phi) = \sum_{j=0}^r \rho_j(x[R_j],a[R_j],\phi)

Hence, our backprojections \( g_i \) now dependent on complete \( x \) and \( a \) vectors

Let's rewrite \( Q(x, a) \) as:

Q(x, a) = \mathbb{E}_{\phi} \big[ r^x_a(\phi) + \gamma\sum_iw_ig_i(par_{B_i}, \phi) \big]
Q(x,a)=Eϕ[rax(ϕ)+γiwigi(parBi,ϕ)]Q(x, a) = \mathbb{E}_{\phi} \big[ r^x_a(\phi) + \gamma\sum_iw_ig_i(par_{B_i}, \phi) \big]

ALP for Logistic MDP

Let's denote:

h(x,a,\phi,w) = \sum_j^r \rho(x[R_j],a[R_j],\phi) +
h(x,a,ϕ,w)=jrρ(x[Rj],a[Rj],ϕ)+h(x,a,\phi,w) = \sum_j^r \rho(x[R_j],a[R_j],\phi) +
+ \sum_i w_i\big(\gamma g_i (par_{B_i},\phi)-\beta_i(b_i) \big)
+iwi(γgi(parBi,ϕ)βi(bi))+ \sum_i w_i\big(\gamma g_i (par_{B_i},\phi)-\beta_i(b_i) \big)

Then ALP task may be reformulated as:

\min_w \sum_i \sum_{b_i} w_i \alpha(b_i)\beta(b_i)
minwibiwiα(bi)β(bi)\min_w \sum_i \sum_{b_i} w_i \alpha(b_i)\beta(b_i)
0 \geq C(x,a,w)
0C(x,a,w)0 \geq C(x,a,w)
\forall x \in X, \forall a \in A
xX,aA\forall x \in X, \forall a \in A

Constraints are now nonlinear since \( p(\phi|x,a) \) is nonlinear. MCV search is not MIP problem now

s.t.

C(x,a,w) = \sum_{\phi \in \{0,1\}}p(\phi|x, a) h(x,a,\phi,w)
C(x,a,w)=ϕ{0,1}p(ϕx,a)h(x,a,ϕ,w)C(x,a,w) = \sum_{\phi \in \{0,1\}}p(\phi|x, a) h(x,a,\phi,w)

ALP for Logistic MDP

\max_{x,a} \sum_{\phi \in \{0,1\}}p(\phi|x, a) h(x,a,\phi,w)
maxx,aϕ{0,1}p(ϕx,a)h(x,a,ϕ,w)\max_{x,a} \sum_{\phi \in \{0,1\}}p(\phi|x, a) h(x,a,\phi,w)

We will denote

p(\phi|x,a) = \sigma(f(x,a))
p(ϕx,a)=σ(f(x,a))p(\phi|x,a) = \sigma(f(x,a))

And

[f_l, f_u] = \{ (x,a) : f_l \leq f(x,a) \leq f_u \}
[fl,fu]={(x,a):flf(x,a)fu}[f_l, f_u] = \{ (x,a) : f_l \leq f(x,a) \leq f_u \}
\sigma_l = \sigma(f_l)
σl=σ(fl)\sigma_l = \sigma(f_l)
\sigma_u = \sigma(f_u)
σu=σ(fu)\sigma_u = \sigma(f_u)

Constant Approximation

\max_{x,a} \sigma^* h(x,a,\phi=1) +(1 - \sigma^*) h(x,a,\phi=0)
maxx,aσh(x,a,ϕ=1)+(1σ)h(x,a,ϕ=0)\max_{x,a} \sigma^* h(x,a,\phi=1) +(1 - \sigma^*) h(x,a,\phi=0)
H^+ = \{ (x,a): h(x,a,\phi=1) - h(x,a,\phi=0) \geq 0\}
H+={(x,a):h(x,a,ϕ=1)h(x,a,ϕ=0)0}H^+ = \{ (x,a): h(x,a,\phi=1) - h(x,a,\phi=0) \geq 0\}

Where \(\sigma^*\) is some constant

We consider two subsets of possible \( (x, a) \) pairs:

where the constraint is non-decreasing with \(\sigma^*\) growth

H^- = \{ (x,a): h(x,a,\phi=1) - h(x,a,\phi=0) < 0\}
H={(x,a):h(x,a,ϕ=1)h(x,a,ϕ=0)<0}H^- = \{ (x,a): h(x,a,\phi=1) - h(x,a,\phi=0) < 0\}

where the constraint is non-increasing with \(\sigma^*\) growth

We denote by \(U^u\) the solution for

\max_{x,a} \sigma_u h(x,a,\phi=1) +(1 - \sigma_u) h(x,a,\phi=0)
maxx,aσuh(x,a,ϕ=1)+(1σu)h(x,a,ϕ=0)\max_{x,a} \sigma_u h(x,a,\phi=1) +(1 - \sigma_u) h(x,a,\phi=0)

s.t.

(x,a)\in [f_l, f_u] \cap H^+
(x,a)[fl,fu]H+(x,a)\in [f_l, f_u] \cap H^+

And by \( U^l \) the solution with \( \sigma_l \) and \( H^- \) instead of \( \sigma_u \) and \( H^+ \)

Constant Approximation

  • \( U^u \) is an upper bound on the maximal constraint violation (CV) in the subset \( (x,a) \in [f_l, f_u] \cap H^+ \)
  • True CV in that point : \( C^u = C(x^u,a^u, w) \) is a lower bound on maximal CV in this subset.

Same thing for the \( U^l \) and \( C^l \) in the subset \( (x,a) \in [f_l,f_u] \cap H^- \)

Hence,

U^* = max(U^l, U^u)
U=max(Ul,Uu)U^* = max(U^l, U^u)
C^* = max(C^l, C^u)
C=max(Cl,Cu)C^* = max(C^l, C^u)

is an upper bound on MCV in \( [f_l, f_u] \)

is a lower bound on MCV in \( [f_l, f_u] \)

Constant Approximation

CV

\( \sigma \)

\cdot
\cdot
\cdot
\cdot
\cdot
\cdot

\( C(x^{(2)}, a^{(2)}, \sigma) \)

\( C(x^{(1)}, a^{(1)}, \sigma) \)

\( U^u \)

\( \sigma_l \)

\( \sigma_u \)

\( \sigma(f(x^{(2)}, a^{(2)})) \)

\( \sigma(f(x^{(1)}, a^{(1)})) \)

The degree of CV for two state-action pairs as a function of \( \sigma \) 

MVC search in ALP-SEARCH

1) Solve two MIP tasks for some interval \( [f_l, f_u] \) :

\max_{x,a} \sigma_u h(x,a,\phi=1) +(1 - \sigma_u) h(x,a,\phi=0)
maxx,aσuh(x,a,ϕ=1)+(1σu)h(x,a,ϕ=0)\max_{x,a} \sigma_u h(x,a,\phi=1) +(1 - \sigma_u) h(x,a,\phi=0)

s.t.

(x,a)\in [f_l, f_u] \cap H^+
(x,a)[fl,fu]H+(x,a)\in [f_l, f_u] \cap H^+
\max_{x,a} \sigma_l h(x,a,\phi=1) +(1 - \sigma_l) h(x,a,\phi=0)
maxx,aσlh(x,a,ϕ=1)+(1σl)h(x,a,ϕ=0)\max_{x,a} \sigma_l h(x,a,\phi=1) +(1 - \sigma_l) h(x,a,\phi=0)

s.t.

(x,a)\in [f_l, f_u] \cap H^-
(x,a)[fl,fu]H(x,a)\in [f_l, f_u] \cap H^-

2) If \( U^* < \epsilon \) then there are no constrain violation in \( [f_l, f_u] \) and we terminate

3) If \( U^* - C^* < \epsilon \) then we report that \( C^* \) is a MCV in \( [f_l, f_u] \) and we terminate

3) If \( C' \) in another interval is larger than \( C^* \) then we terminate

If nothing from above holds we divide interval into two and recursively repeat

Piece-Wise Constant approximation

A piece-wise constant approximation of the sigmoid

MVC search in ALP-APPROX

\max_{x,a} \sigma_i h(x,a,\phi=1) +(1 - \sigma_i) h(x,a,\phi=0)
maxx,aσih(x,a,ϕ=1)+(1σi)h(x,a,ϕ=0)\max_{x,a} \sigma_i h(x,a,\phi=1) +(1 - \sigma_i) h(x,a,\phi=0)
(x,a) \in [\delta_{i-1}, \delta_i]
(x,a)[δi1,δi](x,a) \in [\delta_{i-1}, \delta_i]

s.t.

where

\sigma_i = \sigma(f_i), f_i: \delta_{i-1} \leq f_i \leq \delta_i
σi=σ(fi),fi:δi1fiδi\sigma_i = \sigma(f_i), f_i: \delta_{i-1} \leq f_i \leq \delta_i

Then we calculate:

C^i = C(x^i, a^i , \sigma(x^i,a^i))
Ci=C(xi,ai,σ(xi,ai))C^i = C(x^i, a^i , \sigma(x^i,a^i))

- true CV, probably not a maximal one in \([\delta_{i-1}, \delta_i]\)

C^* = \max_iC^i
C=maxiCiC^* = \max_iC^i

- estimation of the maximal CV in \([f_l, f_u]\)

Approximation error in ALP-APPROX

THEOREM

A bounded log-relative error for the logistic regression (assuming features with finite domains) can be achieved with \( O(\frac{1}{\epsilon} ||u||_1) \) intervals in logit space, where \(u\) is the logistic regression weight vector

THEOREM

Given an interval \( [a, b] \) in logit space, the value \(\sigma(x)\) with $$ x = \ln \frac{e^{a+b} + e^b}{1 + e^b} $$ minimizes the log-relative error over the interval.

Experiments

Text

  • Advertising task
  • Reward: 1 for click, 0 otherwise
  • The aim is to maximize Cumulative Click-Through Rate (CCTR)
  • Features are one-hot encoded
  • Features are divided into three categories:
    • User state (static or dynamic) - state variable
    • Ad description - action variable
    • User-Ad interaction - action variable
  • Transition dynamics is simple: either identity function or Bernoulli distribution on moving to next bucket in feature domain
  • Pretrained Logistic Regression on 300M of examples

Experiments

Text

Model sizes:

  • Tiny:
    • 2 state features (48 binarized)
    • 1 action feature (7 binarized)
  • Small:
    • 6 state features (71 binarized)
    • 4 action feature (15 binarized)
  • Medium:
    • 11 state features (251 binarized)
    • 8 action feature (170 binarized)
  • Large:
    • 12 state features (2630 binarized)
    • 11 action features (224 binarized)

Experiments

Experiments

Experiments

Extensions

Text

  • Relaxation of the CG optimization
  • Cross-product features
  • Multiple response variables
  • Partition-free CG
  • Non-linear response model

Partition-free CG

\max_{x,a} \sigma(f(x,a) )h(x,a,\phi=1) +\sigma(-f(x,a)) h(x,a,\phi=0)
maxx,aσ(f(x,a))h(x,a,ϕ=1)+σ(f(x,a))h(x,a,ϕ=0)\max_{x,a} \sigma(f(x,a) )h(x,a,\phi=1) +\sigma(-f(x,a)) h(x,a,\phi=0)

which is equivalent to

\max_{x,a,y} \sigma(y)h(x,a,\phi=1) +\sigma(-y) h(x,a,\phi=0)
maxx,a,yσ(y)h(x,a,ϕ=1)+σ(y)h(x,a,ϕ=0)\max_{x,a,y} \sigma(y)h(x,a,\phi=1) +\sigma(-y) h(x,a,\phi=0)
y=f(x,a)
y=f(x,a)y=f(x,a)

s.t.

The simple idea is to iteratively alternate between two steps:

  • maximize over \( x, a \) using MIP solver
  • choose \( y = f(x,a) \)

But it will be stuck in local optima almost surely

Partition-free CG

\min_{\lambda}\max_{x,a,y} \sigma(y)h(x,a,\phi=1) +\sigma(-y) h(x,a,\phi=0) -
minλmaxx,a,yσ(y)h(x,a,ϕ=1)+σ(y)h(x,a,ϕ=0)\min_{\lambda}\max_{x,a,y} \sigma(y)h(x,a,\phi=1) +\sigma(-y) h(x,a,\phi=0) -

Another approach is to consider Lagrangian relaxation:

- \lambda f(x,a) + \lambda y
λf(x,a)+λy- \lambda f(x,a) + \lambda y

Primal-Dual alternating optimization:

Initialize

\lambda, x, a, y
λ,x,a,y\lambda, x, a, y

for

t:= 1,...,T
t:=1,...,Tt:= 1,...,T

do:

end for

1)
1)1)
2)
2)2)
y^{(t+1)} = y^{(t)} + \eta_t \nabla^{(t)}_y
y(t+1)=y(t)+ηty(t)y^{(t+1)} = y^{(t)} + \eta_t \nabla^{(t)}_y
(x,a)^{(t+1)}=\arg\max_{x,a} \sigma(y^{(t+1)})h(x,a,\phi=1)+
(x,a)(t+1)=argmaxx,aσ(y(t+1))h(x,a,ϕ=1)+(x,a)^{(t+1)}=\arg\max_{x,a} \sigma(y^{(t+1)})h(x,a,\phi=1)+
+\sigma(-y^{(t+1)})h(x,a,\phi=0)
+σ(y(t+1))h(x,a,ϕ=0)+\sigma(-y^{(t+1)})h(x,a,\phi=0)
\lambda^{(t+1)} = \lambda^{(t)} - \hat{\eta}_t \big[ y^{(t+1)} - f((x,a)^{(t+1)}) \big]
λ(t+1)=λ(t)η^t[y(t+1)f((x,a)(t+1))]\lambda^{(t+1)} = \lambda^{(t)} - \hat{\eta}_t \big[ y^{(t+1)} - f((x,a)^{(t+1)}) \big]
3)
3)3)
4)
4)4)
5)
5)5)
6)
6)6)

Non-linear Response Model

Text

We consider wide-n-deep response model:

  • some features from \(x\) and \(a\) are used as an input to the final logistic output unit
  • some features are passed through DNN with several layers and several non-linear units

If we can express non-linearity in DNN in such way that the input to the final logistic output formulates as a linear-like function of \( (x, a) \) then CG optimization will be the same as for Logistic Regression response model.

 

ReLu non-linearity may be expressed in such way using just one or two indicator functions per unit

Thanks for your

attention!

Made with Slides.com