Equal Contribution
1 Chinese University of Hong Kong (CUHK),
2 Shanghai University of Finance and Economics (SUFE)
As indicated from the official Liblinear website, thanks to contributions from researchers and developers worldwide, Liblinear has incorporated interfaces for various languages:
{ R, Python, Matlab, Java, Perl, Ruby, and even PHP }
The popularity of Liblinear is thus evident.
SVM formulation
The primal is QP with 2n linear constraints
The dual is box-QP with n box constraints
Coordinate Descent
KKT Condition
The dual is box-QP with n box constraints
Coordinate Descent
KKT Condition
where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).
Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)
\( O(nd) \)
(at least O(n) if Q is pre-computed)
where \( \mathbf{Q}_{ij} = y_i y_j \mathbf{x}^T_i \mathbf{x}_j \).
Note that \( (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j \)
$$ (\mathbf{Q} \mathbf{\alpha})_{i} = y_i \mathbf{x}_i^T \sum_{j=1}^n y_j \mathbf{x}_j \alpha_j = y_i \mathbf{x}^T_i \mathbf{\beta} $$
\( O(d) \)
What contributes to the rapid efficiency of Liblinear?
Combine Linear KKT in CD updates.
Extension. When the idea of "LibLinear" can be applied?
Extension. When the idea of "LibLinear" can be applied?
Linear KKT Conditions
Extension. When the idea of "LibLinear" can be applied?
Linear KKT Conditions
In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:
\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)
\( L_i(\cdot) \geq 0\) is the proposed composite ReLU-ReHU loss.
\( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)-th observation.
\(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).
We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the
sample sizes, that is, \(d \ll n\) and \(K \ll n\).
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that
\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)
where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.
Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLU-ReHU.
ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.
ReHLine has a provable linear convergence rate. The per-iteration computational complexity is linear in the sample size.
The linear relationship between primal and dual variables greatly simplifies the computation of CD.
Software. generic/ specialized software
If you like ReHLine
please star 🌟 our Github repository, thank you!