Optimal subsampling for large-sample quantile regression with massive data

  • Li Shao
  • , Shanshan Song
  • , Yong Zhou*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

To balance the explosive growth of data volume and limited budgets for computational resources, one of the popular methods is downscaling the data volume by subsampling a subdataset that inherits the relevant property of the full data. As an alternative to the mean regression model, the quantile regression model has been studied extensively when the data are independent and the data scale is medium. This article focuses on quantile regression with massive data where the sample size n (greater than (Formula presented.) in general) is extraordinarily large but the dimension d (smaller than 20 in general) is small. We first formulate the general subsampling procedure and establish the asymptotic property of the resultant estimator. Then, with the help of optimality criteria in experimental design, we derive two subsampling probabilities that are optimal in the sense of smallest asymptotic mean square error. Since the optimal subsampling probabilities depend on the full data estimator, we develop a two-step optimal subsampling algorithm and study the consistency and asymptotic normality of the resultant estimator. The empirical performance of the optimal subsampling algorithm is evaluated with synthetic and real datasets.

Original languageEnglish
Pages (from-to)420-443
Number of pages24
JournalCanadian Journal of Statistics
Volume51
Issue number2
DOIs
StatePublished - Jun 2023

Keywords

  • Massive data
  • optimal subsampling
  • quantile regression

Fingerprint

Dive into the research topics of 'Optimal subsampling for large-sample quantile regression with massive data'. Together they form a unique fingerprint.

Cite this