Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence
Ben Dai 1, Yixuan Qiu 2
Equal Contribution
1 Chinese University of Hong Kong (CUHK),
2 Shanghai University of Finance and Economics (SUFE)
LIBLINEAR
As indicated from the official Liblinear website, thanks to contributions from researchers and developers worldwide, Liblinear has incorporated interfaces for various languages:
{ R, Python, Matlab, Java, Perl, Ruby, and even PHP }
The popularity of Liblinear is thus evident.
LIBLINEAR
-
SVM formulation
The primal is QP with 2n linear constraints
LIBLINEAR
- Dual form
The dual is box-QP with n box constraints
Coordinate Descent
KKT Condition
LIBLINEAR
- Dual form
The dual is box-QP with n box constraints
Coordinate Descent
KKT Condition
LIBLINEAR
- CD update
where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).
Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)
\( O(nd) \)
(at least O(n) if Q is pre-computed)
LIBLINEAR
- CD update + KKT Condition
where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).
Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)
$$ (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j = y_i \mathbf{x}^T_i \mathbf{\beta} $$
\( O(d) \)
Conclusion from LIBLINEAR
-
What contributes to the rapid efficiency of Liblinear?
- Analytic solution of each CD updates
- Reduce \( O(n^2) \) to \(O(nd)\) in CD updates
- Linear convergence \( O(\log(\epsilon^{-1})) \)
- CD usually is sublinear convergence
Combine Linear KKT in CD updates.
Extension. When the idea of "LibLinear" can be applied?
Conclusion from LibLinear
Extension. When the idea of "LibLinear" can be applied?
- Loss
- hinge loss in SVMs (Yes)
- check loss in Quantile Reg (Yes)
- order > 2 (No)
- A class of losses? PLQ
Linear KKT Conditions
Conclusion from LibLinear
Extension. When the idea of "LibLinear" can be applied?
- Loss
- hinge loss in SVMs (Yes)
- check loss in Quantile Reg (Yes)
- order > 2 (No)
-
A class of losses? QPL
Linear KKT Conditions
- Constraints
- Box constraints (Yes)
- linear constraints (Yes)
Model
In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:
\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)
-
\( L_i(\cdot) \geq 0\) is the proposed composite ReLU-ReHU loss.
-
\( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)-th observation.
-
\(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).
-
We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the
sample sizes, that is, \(d \ll n\) and \(K \ll n\).
Composite ReLU-ReHU Loss
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Composite ReLU-ReHU Loss
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Composite ReLU-ReHU Loss
Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLU-ReHU.
Composite ReLU-ReHU Loss
ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.
Main Results
ReHLine has a provable linear convergence rate. The per-iteration computational complexity is linear in the sample size.
ReHLine
- Inspired by Coordinate Descent (CD) and Liblinear
The linear relationship between primal and dual variables greatly simplifies the computation of CD.
ReHLine
ReHLine
ReHLine
Experiments
Software. generic/ specialized software
- cvx/cvxpy
- mosek (IPM)
- ecos (IPM)
- scs (ADMM)
- dccp (DCP)
- liblinear -> SVM
- hqreg -> Huber
- lightning -> sSVM
Summary
-
Powerful Algo
- We have improved the computing power of a large category of Regularized Empirical Risk Minimization to the level of LibLinear (linear convergence + linear computation)
-
Powerful software
- Efficient software and C++ implementation. ReHLine is equivalent to LIBLINEAR within SVM, but our present implementation can be even faster than LIBLINEAR.
- It provides for flexible application concerning losses and constraints through Python/R API, which are intended to tackle a vast array of ML and STAT problems. (e.g. FairSVM).
Thank you!
If you like ReHLine
please star 🌟 our Github repository, thank you!
liblinear2rehline
By statmlben
liblinear2rehline
[NeurIPS2023] Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence
- 68