Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

 

Ben Dai 1, Yixuan Qiu 2

Equal Contribution

1 Chinese University of Hong Kong (CUHK),

2 Shanghai University of Finance and Economics (SUFE)

LIBLINEAR

As indicated from the official Liblinear website, thanks to contributions from researchers and developers worldwide, Liblinear has incorporated interfaces for various languages:

{ R, Python, Matlab, Java, Perl, Ruby, and even PHP }

 

The popularity of Liblinear is thus evident.

LIBLINEAR

  • SVM formulation

The primal is QP with 2n linear constraints

LIBLINEAR

  • Dual form

The dual is box-QP with n box constraints

Coordinate Descent

KKT Condition

LIBLINEAR

  • Dual form

The dual is box-QP with n box constraints

Coordinate Descent

KKT Condition

LIBLINEAR

  • CD update

where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).

Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)

\( O(nd) \)

(at least O(n) if Q is pre-computed)

LIBLINEAR

  • CD update + KKT Condition

where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).

Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)

$$ (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j = y_i \mathbf{x}^T_i \mathbf{\beta} $$

\( O(d) \)

Conclusion from LIBLINEAR

  • What contributes to the rapid efficiency of Liblinear?

  • Analytic solution of each CD updates
  • Reduce \( O(n^2) \) to \(O(nd)\) in CD updates
  • Linear convergence \( O(\log(\epsilon^{-1})) \)
    • CD usually is sublinear convergence

Combine Linear KKT in CD updates.

Extension. When the idea of "LibLinear" can be applied?

Conclusion from LibLinear

Extension. When the idea of "LibLinear" can be applied?

  • Loss
    • hinge loss in SVMs (Yes)
    • check loss in Quantile Reg (Yes)
    • order > 2 (No)
    • A class of losses? PLQ

Linear KKT Conditions

Conclusion from LibLinear

Extension. When the idea of "LibLinear" can be applied?

  • Loss
    • hinge loss in SVMs (Yes)
    • check loss in Quantile Reg (Yes)
    • order > 2 (No)
    • A class of losses? QPL

Linear KKT Conditions

  • Constraints
    • Box constraints (Yes)
    • linear constraints (Yes)

Model

In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:

\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)

  • \( L_i(\cdot) \geq 0\) is the proposed composite ReLU-ReHU loss.

  • \( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)-th observation.

  • \(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).

  • We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the

    sample sizes, that is, \(d \ll n\) and \(K \ll n\).

Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that

 

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that

 

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLU-ReHU.

Composite ReLU-ReHU Loss

ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.

Main Results

ReHLine has a provable linear convergence rate. The per-iteration computational complexity is linear in the sample size.

ReHLine

  • Inspired by Coordinate Descent (CD) and Liblinear

The linear relationship between primal and dual variables greatly simplifies the computation of CD.

ReHLine

ReHLine

ReHLine

Experiments

Software. generic/ specialized software

  • cvx/cvxpy
  • mosek (IPM)
  • ecos (IPM)
  • scs (ADMM)
  • dccp (DCP)
  • liblinear -> SVM
  • hqreg -> Huber
  • lightning -> sSVM

Summary

  • Powerful Algo
    • We have improved the computing power of a large category of Regularized Empirical Risk Minimization to the level of LibLinear (linear convergence + linear computation)
  • Powerful software
    • Efficient software and C++ implementation. ReHLine is equivalent to LIBLINEAR within SVM, but our present implementation can be even faster than LIBLINEAR.
    • It provides for flexible application concerning losses and constraints through Python/R API, which are intended to tackle a vast array of ML and STAT problems. (e.g. FairSVM).

Thank you!

If you like ReHLine  please star 🌟 our Github repository, thank you!

liblinear2rehline

By statmlben

liblinear2rehline

[NeurIPS2023] Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

  • 68