Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

 

Ben Dai 1, Yixuan Qiu 2

Equal Contribution

1 Chinese University of Hong Kong (CUHK), 2 Shanghai University of Finance and Economics

Motivation

  • Empirical risk minimization (ERM) is a fundamental framework in machine learning
  • Many different loss functions
  • Efficient solvers exist for specific problems
  • E.g., Liblinear for hinge loss SVM

Motivation

  • Can we develop optimization algorithms for general ERM loss functions?
  • Can we achieve provable fast convergence rates?
  • Can we transfer the empirical success of Liblinear to general ERM problems?

Model

In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:

\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)

  • \( L_i(\cdot) \geq 0\) is the proposed composite ReLU-ReHU loss.

  • \( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)-th observation.

  • \(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).

  • We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the

    sample sizes, that is, \(d \ll n\) and \(K \ll n\).

Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that

 

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that

 

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLU-ReHU.

Composite ReLU-ReHU Loss

ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.

Main Results

ReHLine has a provable linear convergence rate. The per-iteration computational complexity is linear in the sample size.

ReHLine

  • Inspired by Coordinate Descent (CD) and Liblinear

The linear relationship between primal and dual variables greatly simplifies the computation of CD.

ReHLine

ReHLine

ReHLine

Experiments

Software. generic/ specialized software

  • cvx/cvxpy
  • mosek (IPM)
  • ecos (IPM)
  • scs (ADMM)
  • dccp (DCP)
  • liblinear -> SVM
  • hqreg -> Huber
  • lightning -> sSVM

Thank you!

rehline

By statmlben

rehline

[NeurIPS2023] Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

  • 164