Abstract
The principal support vector machines method is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations while preserving the information for regression and classification. However, the computational burden of the principal support vector machines method constrains its use for massive data. To address this issue, we propose a naive and a refined distributed estimation algorithms for fast implementation when the sample size is large. Both distributed sufficient dimension reduction estimators exhibit the same statistical efficiency as when all the data is merged together, which provides rigorous statistical guarantees for their application to large-scale datasets, while the refined method requires smaller batch sample sizes and hence is more advantageous when memory limitations exist on distributed machines. The two distributed algorithms are further adapted to principal weighted support vector machines for sufficient dimension reduction in binary classification. The statistical accuracy and computational complexity of our proposed methods are examined through comprehensive simulation studies and in a real data application with more than 600,000 samples.
| Original language | English |
|---|---|
| Pages (from-to) | 254-266 |
| Number of pages | 13 |
| Journal | Technometrics |
| Volume | 67 |
| Issue number | 2 |
| DOIs | |
| State | Published - 2025 |
Keywords
- Distributed estimation
- Principal support vector machine
- Sliced inverse regression
- Sufficient dimension reduction