TY - JOUR
T1 - Linear convergence of proximal gradient method for linear sparse SVM
AU - Jiao, Xiaoqi
AU - Lian, Heng
AU - Liu, Jiamin
AU - Zhang, Yingying
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/2
Y1 - 2026/2
N2 - Despite the hinge loss function being non-strongly-convex and non-strongly smooth, we establish the linear rate of convergence for sparse linear support vector machines (SVM) up to its statistical accuracy. The algorithm we use is the proximal gradient method for composite functions, applied to a sequence of regularization parameters to compute the approximate solution path on a grid. Unlike works on loss functions that are strongly convex and strongly smooth, here we do not have linear convergence to the exact solution, but we can demonstrate linear convergence to the population truth up to the statistical error (in particular, we simultaneously consider numerical convergence and statistical convergence). For any regularization parameter in the chosen decreasing sequence, we show that the estimator is in a small neighborhood of the exact solution after O(logs*) iterations, where s* is the sparsity of the true coefficient in the model, and a total number of O(logn) stages (i.e., using a sequence of regularization parameters of length O(logn)) are required to achieve the near-oracle statistical rate, with n the sample size.
AB - Despite the hinge loss function being non-strongly-convex and non-strongly smooth, we establish the linear rate of convergence for sparse linear support vector machines (SVM) up to its statistical accuracy. The algorithm we use is the proximal gradient method for composite functions, applied to a sequence of regularization parameters to compute the approximate solution path on a grid. Unlike works on loss functions that are strongly convex and strongly smooth, here we do not have linear convergence to the exact solution, but we can demonstrate linear convergence to the population truth up to the statistical error (in particular, we simultaneously consider numerical convergence and statistical convergence). For any regularization parameter in the chosen decreasing sequence, we show that the estimator is in a small neighborhood of the exact solution after O(logs*) iterations, where s* is the sparsity of the true coefficient in the model, and a total number of O(logn) stages (i.e., using a sequence of regularization parameters of length O(logn)) are required to achieve the near-oracle statistical rate, with n the sample size.
KW - Geometric/Linear convergence
KW - Proximal gradient descent
KW - Sparsity
KW - Support vector machines
UR - https://www.scopus.com/pages/publications/105017846621
U2 - 10.1016/j.neunet.2025.108162
DO - 10.1016/j.neunet.2025.108162
M3 - 文章
AN - SCOPUS:105017846621
SN - 0893-6080
VL - 194
JO - Neural Networks
JF - Neural Networks
M1 - 108162
ER -