ADMM and GCS

Meeting with Russ and Pablo

October 23rd 2023

Bernhard Paus Graesdal

Motivation

Solve larger GCS instances
Leverage parallel computation
Custom GCS solver
- (not depend on Mosek)
- Amazon told me today that not having Mosek has been a bottleneck for them and using GCS

ADMM Background

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

General ADMM problem looks like this:

\(f: \R^n \rightarrow \R\) and \(g: \R^m \rightarrow \R\) are convex functions
\(A \in \R^{l \times n}\) and \(B \in \R^{l \times m}\) are matrix coefficients
\(x \in \R^n\) and \(z \in \R^m\) are optimization variables

ADMM Background

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

Form the Augmented Lagrangian

\begin{aligned} L_\rho(x, z, y) = f(x) + g(z) &+ y^\intercal(Ax + Bz - c) + \frac{1}{2}\rho \| Ax + Bz - c\|_2^2 \end{aligned}

\(\rho > 0, \ \rho \in \R\) can be seen as a penalty parameter
\(\|\cdot\|_2\) denotes the Euclidean norm
\(y \in \R^l\) is a vector of dual variables

ADMM Background

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

Initialization: Choose initial guesses \(x^{(0)}\), \(z^{(0)}\), and \(y^{(0)}\).
Iteration: For \(k = 0, 1, 2, \ldots\) until convergence, update:

\begin{aligned} L_\rho(x, z, y) = f(x) + g(z) &+ y^\intercal(Ax + Bz - c) + \frac{1}{2}\rho \| Ax + Bz - c\|_2^2 \end{aligned}

\begin{aligned} x^{(k+1)} & := \argmin_x L_\rho(x, z^{(k)}, y^{(k)}) \\ z^{(k+1)} & := \argmin_z L_\rho(x^{(k+1)}, z, y^{(k)}) \\ y^{(k+1)} & := y^{(k)} + \rho \left( Ax^{(k+1)} + Bz^{(k+1)} - c \right) \end{aligned}

3. Convergence Check: Stop the iterations if the stopping criteria are satisfied.

Scaled price variables

2. Iteration: For \(k = 0, 1, 2, \ldots\) until convergence, update:

\begin{aligned} L_\rho(x, z, u) =& f(x) + g(z) - (\rho/2) \|u\|_2^2 + (\rho/2) \| Ax + Bz - c + u\|_2^2 \end{aligned}

\begin{aligned} x^{(k+1)} & := \argmin_x f(x) + (\rho/2) \| Ax + Bz^{(k)} - c + u^{(k)}) \|_2^2 \\ z^{(k+1)} & := \argmin_z g(z) + (\rho/2) \| Ax^{(k+1)} + Bz - c + u^{(k)}) \|_2^2 \\ u^{(k+1)} & := u^{(k)} + Ax^{(k+1)} + Bz^{(k+1)} - c \end{aligned}

Introduce a scaled dual variable \( u:= (1/\rho) y\), and reformulate the Augmented Lagrangian (using completion of squares) as:

Gives simpler update steps:

ADMM & Constrained Convex Optimization

\begin{aligned} x^{(k+1)} & := \argmin_x \left( f(x) + (\rho/2) \| x - z^{(k)} + u^{(k)} \|_2^2 \right) \\ z^{(k+1)} & := \Pi_\mathcal{C}( x^{(k+1)} + u^{(k)}) \\ u^{(k+1)} & := u^{(k)} + x^{(k+1)} - z^{(k+1)} \end{aligned}

Update steps now take the form:

\begin{aligned} \min \quad& f(x) \\ \text{s.t.} \quad & x \in \mathcal{C} \end{aligned}

\begin{aligned} \min \quad& f(x) + \tilde{I}_\mathcal{C}(z) \\ \text{s.t.} \quad & x - z = 0 \end{aligned}

\iff

Put the problem into ADMM form:

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

ADMM & Global Variable Consensus

\begin{aligned} x^{(k+1)}_i & := \argmin_{x_i} \left( f_i(x_i) + (\rho/2) \| x_i - z^{(k)} + u^{(k)} \|_2^2 \right) \\ z^{(k+1)} & := \frac{1}{N} \sum^N_{i=1} (x_i^{(k+1)} + u^{(k)}_i ) \\ u^{(k+1)}_i & := u^{(k)}_i + x_i^{(k+1)} - z_i^{(k+1)} \end{aligned}

Update steps now take the form:

\begin{aligned} \min \quad& \sum_{i=1}^N f_i(x) \end{aligned}

\begin{aligned} \min \quad& \sum_{i=1}^N f_i(x_i) \\ \text{s.t.} \quad & x_i - z = 0, \quad i = 1, \ldots, N \end{aligned}

\iff

Put the problem into ADMM form:

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

ADMM & Conic Programming

Put the problem into ADMM form:

\begin{aligned} \min \quad& f(x) + g(z) \\ \text{s.t.} \quad & Ax + Bz = c \end{aligned}

\( Ax \in K \iff Ax = s_1, \, s_2 \in K, \, s_1 = s_2 \)

Put the \( s_1 = s_2 \) into the cost

Two paths:

ADMM and GCS

Solve the MICP directly
1. Easier to start with
2. Things decompose more nicely
3. No (or little) convergence guarantees?
Solve the convex relaxation

1. Solve the MICP directly

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} \min \quad& \sum_{e := (u,v) \in \mathcal{E}} y_e l_e(x_u, x_v) \\ \text{s.t.} \quad & y_v := (y_e)_{e \in \mathcal{E}_v} \\ \quad & y_v \in \Phi_v, \quad \forall v \in \mathcal{V} \\ \quad & x_v \in \mathcal{X}_v, \quad \forall v \in \mathcal{V} \\ \quad & y_e \in \set{0,1} \quad \forall e \in \mathcal{E} \end{aligned}

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} \Phi_v := \left\{ y_v \geq 0 \, \middle| \, \sum_{e \in \mathcal{E}_v^\text{in}} y_e + \delta_{sv} = \sum_{e \in \mathcal{E}_v^\text{in}} y_e + \delta_{tv} \leq 1 \right\} \end{aligned}

Bi-convex in \( y_e, \, e \in \mathcal{E} \) and \( x_v, \,v \in \mathcal {V} \)

Flow polytope:

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} \min \quad& \sum_{e := (u,v) \in \mathcal{E}} y_e l_e(x_{eu}, x_{ev}) \\ \text{s.t.} \quad & x_{eu} \in \mathcal{X}_u, \; x_{ev} \in \mathcal{X}_v, \\ \quad & x_{eu} - x_u = 0, \\ \quad & x_{ev} - x_v = 0, \quad \forall e = (u,v) \in \mathcal{E} \\ \quad & (y_e)_{e \in \mathcal{E}_v} \in \Phi_v, \quad \forall v \in \mathcal{V} \end{aligned}

For each edge \(e = (u, v)\) introduce the local variables \(x_{eu}\) and \(x_{ev}\). We can now rewrite the SPP GCS problem as:

"Consensus constraints"

\( \leftarrow \)

Cost now decomposes independently over edges

\( \leftarrow \)

1. Solve the MICP directly

Three blocks of variables:

Consensus variables: \( x_v, \, v \in \mathcal{V} \)
Local variables: \((x_{eu}, x_{ev}), \, e = (u,v) \in \mathcal{E} \)
Flow variables: \( y_e, \, e \in \mathcal{E} \)

The problem is bi-convex in these

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} L_\rho(\cdot) = \sum_{e = (u,v) \in \mathcal{E}} \Bigl( y_e l_e(x_{eu}, x_{ev}) + (\rho/2) \| G(x_u, x_v, x_{eu}, x_{ev}) + \lambda_e\|_2^2 - (\rho/2) \| \lambda_e \|_2^2 \Bigr) \end{aligned}

Form the lagrangian by relaxing "only" the consensus constraints

\( \lambda_e = (\lambda_{eu}, \lambda_{ev}) \in \R^{|\mathcal{X}_u| + |\mathcal{X}_v|} \)
\(G(x_u, x_v, x_{eu}, x_{ev}) = (x_{eu} - x_u, \ x_{ev} - x_v) \in \R^{ | \mathcal{X}_u | + | \mathcal{X}_v | }\)

(To be precise we are really doing the whole machinery with indicator functions over sets)

1. Solve the MICP directly

\begin{aligned} \min \quad& y_e^{(k)} l_e(x_{eu}, x_{ev}) + (\rho / 2) \| G(x_u^{(k)}, x_v^{(k)}, x_{eu}, x_{ev}) + \lambda_e^{(k)} \|_2^2 \\ \text{s.t.} \quad & x_{eu} \in \mathcal{X}_u \\ \quad & x_{ev} \in \mathcal{X}_v \end{aligned}

Update steps:

1. Solve the MICP directly

1) Local Update

For each edge \( e = (u,v) \in \mathcal{E} \) we update the local variables while keeping the other variables fixed.
The update step for \( (x_{eu}^{(k+1)}, x_{ev}^{(k+1)}) \) is the solution to the following constrained convex optimization problem:

\begin{aligned} x_{v}^{(k+1)} := \frac{1}{ | \mathcal{E}_v | } \sum_{e \in \mathcal{E}_v} \left( x_{ev}^{(k+1)} + \lambda_{ev}^{(k)} \right) \end{aligned}

Update steps:

1. Solve the MICP directly

2) Consensus Update

For each vertex \( v \in \mathcal{V} \) we update the consensus variable \( x_v \) as:

(The minimization is just an unconstrained quadratic optimization, so this falls right out of that)

\begin{aligned} \min \quad& \sum_{e := (u,v) \in \mathcal{E}} y_e c_e \\ \text{s.t.} \quad & (y_e)_{e \in \mathcal{E}_v} \in \Phi_v, \quad \forall v \in \mathcal{V} \end{aligned}

Update steps:

1. Solve the MICP directly

3) Discrete SPP update

Minimizing only wrt flows gives us a discrete SPP:

where \( c_e := l_e(x_{eu}^{(k+1)}, x_{ev}^{(k+1)}) \) is a constant
NOTE: We are not solving SPP over consensus variables

\begin{aligned} \lambda^{(k+1)}_e & := \lambda^{(k)}_e + G(x_u^{(k+1)}, x_v^{(k+1)}, x_{eu}^{(k+1)}, x_{ev}^{(k+1)}) \end{aligned}

Update steps:

1. Solve the MICP directly

4) Price update

For each edge \( e \in \mathcal{E} \), the update step for the dual variables is

\(G(x_u, x_v, x_{eu}, x_{ev}) = (x_{eu} - x_u, \ x_{ev} - x_v) \in \R^{ | \mathcal{X}_u | + | \mathcal{X}_v | }\)

Can be interpreted as the integral of the error/constraint violation

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} \min &\quad \sum_{e \in \mathcal{E}} \tilde{l}_e(z_e, z_e', y_e) \\ \text{s.t.} \quad & \sum_{e\in \mathcal{E}^\text{out}_s} y_e = 1, \sum_{e\in \mathcal{E}^\text{in}_t} y_e = 1 \\ \quad & \sum_{e\in \mathcal{E}^\text{in}_v} y_e \leq 1, \quad \forall v \in \mathcal{V} - \set{s,t} \\ \quad & \sum_{e\in \mathcal{E}^\text{in}_v} (z_e', y_e) = \sum_{e\in \mathcal{E}^\text{out}_v} (z_e, y_e) \quad \forall v \in \mathcal{V} - \set{s,t} \\ \quad & (z_e,y_e) \in \tilde{\mathcal{X}}_u \\ \quad & (z_e',y_e) \in \tilde{\mathcal{X}}_v, \quad \forall e = (u,v) \in \mathcal{E} \end{aligned}

2. Solve the convex relaxation

Convex relaxation of GCS problem:

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} \min &\quad \sum_{v \in \mathcal{V}} \left( \sum_{e \in \mathcal{E}_v} \tilde{l}_e(z_{ve}, z_{ve}', y_{ve}) \right) \\ \text{s.t.} \quad & \sum_{e\in \mathcal{E}^\text{out}_s} y_{se} = 1, \sum_{e\in \mathcal{E}^\text{in}_t} y_{te} = 1 \\ \quad & \sum_{e\in \mathcal{E}^\text{in}_v} y_{ve} \leq 1, \quad \forall v \in \mathcal{V} - \set{s,t} \\ \quad & \sum_{e\in \mathcal{E}^\text{in}_v} (z_{ve}', y_{ve}) = \sum_{e\in \mathcal{E}^\text{out}_v} (z_{ve}, y_{ve}) \quad \forall v \in \mathcal{V} - \set{s,t} \\ \quad & (z_{ve},y_{ve}) \in \tilde{\mathcal{X}}_u \\ \quad & (z_{ve}',y_{ve}) \in \tilde{\mathcal{X}}_w, \quad \forall v \in \mathcal{V}, \, \forall e = (u,w) \in \mathcal{E}_v \\ \quad & (z_{ve}, z_{ve}', y_{ve}) = (z_{e}, z_{e}', y_{e}) \quad \forall v \in \mathcal{V}, \, \forall e \in \mathcal{E}_v \end{aligned}

2. Solve the convex relaxation

Decompose the problem over vertices.
Reformulate as:

For each vertex \( v \in \mathcal{V} \) we have defined auxilliary variables (local variables) as \( z_{ve}, z_{ve}', y_{ve}, \, \forall e \in \mathcal{E}_v\).
Constraints are now only enforced over the local variables
Coupling constraints ensure that the two programs are equivalent.

\newcommand{\set}[1]{\left\{#1\right\}} \begin{aligned} L_\rho(\cdot) = \sum_{v \in \mathcal{V}} \Bigl( \sum_{e \in \mathcal{E}_v} & \tilde{l}_e(z_{ve}, z_{ve}', y_{ve}) \\ &+ (\rho/2) \| z_{ve} - z_e + \lambda_{ve} \|_2^2 \\ &+ (\rho/2) \| z_{ve}' - z_e' + \lambda_{ve}' \|_2^2 \\ &+ (\sigma/2) \| y_{ve} - y_e + \mu_{ve} \|_2^2 \\ &- (\rho/2) \| \lambda_{ve} \|_2^2 - (\rho/2) \| \lambda_{ve}' \|_2^2 - (\sigma/2) \| \mu_{ve} \|_2^2 \Bigr) \end{aligned}

2. Solve the convex relaxation

Form Augmented Lagrangian:

\begin{aligned} \min \quad& \sum_{e \in \mathcal{E}_v} \Bigl( \tilde{l}_e(z_{ve}, z_{ve}', y_{ve}) \\ \quad \quad & + (\rho/2) \| z_{ve} - z_e + \lambda_{ve} \|_2^2 \\ \quad \quad & + (\rho/2) \| z_{ve}' - z_e' + \lambda_{ve}' \|_2^2 \\ \quad \quad & + (\sigma/2) \| y_{ve} - y_e + \mu_{ve} \|_2^2 \Bigr) \\ \text{s.t.} \quad & \sum_{e\in \mathcal{E}^\text{in}_v} y_{ve} \leq 1 \\ \quad & \sum_{e\in \mathcal{E}^\text{in}_v} (z_{ve}', y_{ve}) = \sum_{e\in \mathcal{E}^\text{out}_v} (z_{ve}, y_{ve}) \\ \quad & (z_{ve},y_{ve}) \in \tilde{\mathcal{X}}_u \\ \quad & (z_{ve}',y_{ve}) \in \tilde{\mathcal{X}}_w, \quad \forall e = (u,w) \in \mathcal{E}_v \end{aligned}

Update steps:

1) Local Update

For each vertex \( v \in \mathcal{V} \) we update the local variables while keeping the other variables fixed.

2. Solve the convex relaxation

\begin{aligned} z_e^{(k+1)} := \argmin \| z_{ue} - z_e^{(k)} + \lambda_{ue} \|^2_2 + \| z_{we} - z_e^{(k)} + \lambda_{we} \|^2_2 \\ z_e^{'(k+1) }:= \argmin \| z_{ue}' - z_e^{'(k)} + \lambda_{ue}' \|^2_2 + \| z_{we}' - z_e^{'(k)} + \lambda_{we}' \|^2_2 \\ y_e^{(k+1)} := \argmin \| y_{ue} - y_e^{(k)} + \mu_{ue} \|^2_2 + \| y_{we} - y_e^{(k)} + \mu_{we} \|^2_2 \\ \end{aligned}

Update steps:

2) Consensus Update

For each edge \( e = (u,w) \in \mathcal{E} \) we solve a unconstrained QP (closed form solution)

2. Solve the convex relaxation

\begin{aligned} \lambda^{(k+1)}_{ve} & := \lambda^{(k)}_{ve} + z_{ue}^{(k+1)} - z_e^{(k+1)} \\ \lambda^{'(k+1)}_{ve} & := \lambda^{'(k)}_{ve} + z_{ue}^{'(k+1)} - z_e^{'(k+1)} \\ \mu^{(k+1)}_{ve} & := \mu^{(k)}_{ve} + y_{ue}^{(k+1)} - z_e^{(k+1)} \\ \end{aligned}

Update steps:

3) Price update

For each vertex \( v \in \mathcal{V}\) and edge \(e \in \mathcal{E}_v\) the update steps for the dual variables are:

2. Solve the convex relaxation

Have not implemented this yet...