## Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

### Ben Dai 1, Yixuan Qiu 2

Equal Contribution

1 Chinese University of Hong Kong (CUHK), 2 Shanghai University of Finance and Economics

## Motivation

• Empirical risk minimization (ERM) is a fundamental framework in machine learning
• Many different loss functions
• Efficient solvers exist for specific problems
• E.g., Liblinear for hinge loss SVM

## Motivation

• Can we develop optimization algorithms for general ERM loss functions?
• Can we achieve provable fast convergence rates?
• Can we transfer the empirical success of Liblinear to general ERM problems?

## Model

In this paper, we consider a general regularized ERM based on a convex PLQ loss with linear constraints:

$$\min_{\mathbf{\beta} \in \mathbb{R}^d} \sum_{i=1}^n L_i(\mathbf{x}_i^\intercal \mathbf{\beta}) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{ s.t. } \mathbf{A} \mathbf{\beta} + \mathbf{b} \geq \mathbf{0},$$

• $$L_i(\cdot) \geq 0$$ is the proposed composite ReLU-ReHU loss.

• $$\mathbf{x}_i \in \mathbb{R}^d$$ is the feature vector for the $$i$$-th observation.

• $$\mathbf{A} \in \mathbb{R}^{K \times d}$$ and $$\mathbf{b} \in \mathbb{R}^K$$ are linear inequality constraints for $$\mathbf{\beta}$$.

• We focus on working with a large-scale dataset, where the dimension of the coefficient vector and the total number of constraints are comparatively much smaller than the

sample sizes, that is, $$d \ll n$$ and $$K \ll n$$.

## Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function $$L(z)$$ is composite ReLU-ReHU, if there exist $$\mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}$$ and $$\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}$$ such that

$$L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)$$

where $$\text{ReLU}(z) = \max\{z,0\}$$, and $$\text{ReHU}_{\tau_h}(z)$$ is defined below.

## Composite ReLU-ReHU Loss

Definition 1 (Dai and Qiu. 2023). A function $$L(z)$$ is composite ReLU-ReHU, if there exist $$\mathbf{u}, \mathbf{v} \in \mathbb{R}^{L}$$ and $$\mathbf{\tau}, \mathbf{s}, \mathbf{t} \in \mathbb{R}^{H}$$ such that

$$L(z) = \sum_{l=1}^L \text{ReLU}( u_l z + v_l) + \sum_{h=1}^H \text{ReHU}_{\tau_h}( s_h z + t_h)$$

where $$\text{ReLU}(z) = \max\{z,0\}$$, and $$\text{ReHU}_{\tau_h}(z)$$ is defined below.

## Composite ReLU-ReHU Loss

Theorem 1 (Dai and Qiu. 2023). A loss function $$L:\mathbb{R}\rightarrow\mathbb{R}_{\geq 0}$$ is convex PLQ if and only if it is composite ReLU-ReHU.

## Composite ReLU-ReHU Loss

ReHLine applies to any convex piecewise linear-quadratic loss function (potential for non-smoothness included), including the hinge loss, the check loss, the Huber loss, etc.

## Main Results

ReHLine has a provable linear convergence rate. The per-iteration computational complexity is linear in the sample size.

## ReHLine

• Inspired by Coordinate Descent (CD) and Liblinear

The linear relationship between primal and dual variables greatly simplifies the computation of CD.

## Experiments

Software. generic/ specialized software

• cvx/cvxpy
• mosek (IPM)
• ecos (IPM)
• dccp (DCP)
• liblinear -> SVM
• hqreg -> Huber
• lightning -> sSVM

By statmlben

# rehline

[NeurIPS2023] Regularized Composite ReLU-ReHU Loss Minimization with Linear Computation and Linear Convergence

• 164