Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

Ben Dai 1, Yixuan Qiu 2

Equal Contribution

1 Chinese University of Hong Kong (CUHK), 2 Shanghai University of Finance and Economics

Motivation

Empirical risk minimization (ERM) is a fundamental framework in machine learning
Many different loss functions
Efficient solvers exist for specific problems
E.g., Liblinear for hinge loss SVM

Motivation

Can we develop optimization algorithms for general ERM loss functions?
Can we achieve provable fast convergence rates?
Can we transfer the empirical success of Liblinear to general ERM problems?

Model

In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:

\( \min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0}, \)

\( L_i(\cdot) \geq 0\) is the proposed composite ReLU-ReHU loss.
\( \mathbf{x}_i \in \mathbb{R}^d\) is the feature vector for the \(i\)-th observation.
\(\mathbf{A} \in \mathbb{R}^{K \times d}\) and \(\mathbf{b} \in \mathbb{R}^K\) are linear inequality constraints for \(\mathbf{\beta}\).
We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the

sample sizes, that is, \(d \ll n\) and \(K \ll n\).

Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function \(L(z)\) is composite ReLU-ReHU, if there exist \( \mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}\) and \(\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}\) such that

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

\( L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)\)

where \( \text{ReLU}(z) = \max\{z,0\}\), and \( \text{ReHU}_{\tau_h}(z)\) is defined below.

Composite ReLU-ReHU Loss

Theorem 1 (Dai and Qiu. 2023). A loss function \(L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}\) is convex PLQ if and only if it is composite ReLU-ReHU.

Composite ReLU-ReHU Loss

ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.