Quantile regression in big data: A divide and conquer based strategy

  • Lanjue Chen
  • , Yong Zhou*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

54 Scopus citations

Abstract

Quantile regression, which analyzes the conditional distribution of outcomes given a set of covariates, has been widely used in many fields. However, the volume and velocity of big data make the estimation of quantile regression model extremely difficult due to the intensive computation and the limited storage. Based on divide and conquer strategy, a simple and efficient method is proposed to address this problem. The proposed approach only keeps summary statistics of each data block and then can use them to reconstruct the estimator of the entire data with asymptotically negligible approximation error. This property makes the proposed method particularly appealing when data blocks are retained in multiple servers or come in the form of data stream. Furthermore, the proposed estimator is shown to be consistent and asymptotically as efficient as the estimating equation estimator calculated using the entire data together when certain conditions hold. The merits of the proposed method are illustrated using both simulation studies and real data analysis.

Original languageEnglish
Article number106892
JournalComputational Statistics and Data Analysis
Volume144
DOIs
StatePublished - Apr 2020

Keywords

  • Data stream
  • Divide and conquer
  • Estimating equation
  • Massive data sets
  • Quantile regression

Fingerprint

Dive into the research topics of 'Quantile regression in big data: A divide and conquer based strategy'. Together they form a unique fingerprint.

Cite this