Regularized Composite ReLUReHU Loss Minimization with Linear Computation and Linear Convergence
Ben Dai 1, Yixuan Qiu 2
Equal Contribution
1 Chinese University of Hong Kong (CUHK), 2 Shanghai University of Finance and Economics
Motivation
 Empirical risk minimization (ERM) is a fundamental framework in machine learning
 Many different loss functions
 Efficient solvers exist for specific problems
 E.g., Liblinear for hinge loss SVM
Motivation
 Can we develop optimization algorithms for general ERM loss functions?
 Can we achieve provable fast convergence rates?
 Can we transfer the empirical success of Liblinear to general ERM problems?
Model
In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:
\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \ \mathbf{\beta} \_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)

\( L_i(\cdot) \geq 0\) is the proposed composite ReLUReHU loss.

\( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)th observation.

\(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).

We focus on working with a largescale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the
sample sizes, that is, \(d \ll n\) and \(K \ll n\).
Composite ReLUReHU Loss
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLUReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Composite ReLUReHU Loss
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLUReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Composite ReLUReHU Loss
Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLUReHU.
Composite ReLUReHU Loss
ReHLine applies to any convex piecewise linearquadratic loss function (potential for nonsmoothness included), including the hinge loss, the check loss, the Huber loss, etc.
Main Results
ReHLine has a provable linear convergence rate. The periteration computational complexity is linear in the sample size.
ReHLine
 Inspired by Coordinate Descent (CD) and Liblinear
The linear relationship between primal and dual variables greatly simplifies the computation of CD.
ReHLine
ReHLine
ReHLine
Experiments
Software. generic/ specialized software
 cvx/cvxpy
 mosek (IPM)
 ecos (IPM)
 scs (ADMM)
 dccp (DCP)
 liblinear > SVM
 hqreg > Huber
 lightning > sSVM
Thank you!
rehline
By statmlben
rehline
[NeurIPS2023] Regularized Composite ReLUReHU Loss Minimization with Linear Computation and Linear Convergence
 164