Skip to main navigation Skip to search Skip to main content

Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information

  • CAS - Academy of Mathematics and System Sciences
  • Chengdu No.7 High School

Research output: Contribution to journalArticlepeer-review

Abstract

Heterogeneous auxiliary information commonly arises in big data due to diverse study settings and privacy constraints. Excluding such indirect evidence often results in a substantial loss of statistical inference efficiency. This article proposes a novel framework for integrating a mixture of individual-level data and multiple external heterogeneous summary statistics by multiplying likelihood functions and confidence densities. Theoretically, we show that the proposed method possesses desirable properties and can achieve statistical efficiency comparable to that of the individual participant data (IPD) estimator, which uses all available individual-level data. Furthermore, we develop a communication-efficient distributed inference procedure for massive datasets with heterogeneous auxiliary information. We demonstrate that the proposed iterative algorithm achieves linear convergence under general conditions or generalized linear models. Finally, extensive simulations and real data applications are conducted to illustrate the performance of the proposed methods.

Original languageEnglish
Article number28
JournalJournal of Machine Learning Research
Volume27
StatePublished - 2026

Keywords

  • communication efficiency
  • confidence density
  • distributed inference
  • heterogeneous auxiliary information
  • massive data

Fingerprint

Dive into the research topics of 'Communication-efficient Distributed Statistical Inference for Massive Data with Heterogeneous Auxiliary Information'. Together they form a unique fingerprint.

Cite this