TY - JOUR
T1 - Enhancements of communication-efficient distributed statistical inference and its privacy preservation
AU - Yu, Miaomiao
AU - Li, Jiaxuan
AU - Zhou, Yong
N1 - Publisher Copyright:
© 2025
PY - 2026/1
Y1 - 2026/1
N2 - In the modern era of big data, the vast amount of available data has brought more ways to analyze important economic and financial issues. For example, predicting the probability of individual default has become more accurate, as the number of defaulted individuals has increased year-on-year with the increase in data volume, leading to a more detailed characterization of the defaulted population. However, it presents new challenges and one of them is that all samples are separately stored in different machines and cannot be transferred directly for privacy considerations and limited data storage capacity. This paper develops an improved communication-efficient distributed algorithm in which more local summarized information is used to estimate the high-order derivatives of the loss function with lower communication cost. Furthermore, to protect the privacy in the interacted vector, we design a privacy-preserving algorithm based on the differential privacy constraint by adding a Laplace-distributed noise term in the parameters that can be extended to other cases beyond distributed architectures. Both non-private and private schemes, in which only local estimators are passed from the local machine to the central machine, are more theoretically and practically accurate and efficient than their counterparts. Then we suggest a bootstrap scheme to estimate the covariance matrix of the parametric estimators that is beneficial to effective inference. Finally, we find that the proposed method can effectively handle the practical activities that are, accurate probabilistic predictions of default risk and climate activity.
AB - In the modern era of big data, the vast amount of available data has brought more ways to analyze important economic and financial issues. For example, predicting the probability of individual default has become more accurate, as the number of defaulted individuals has increased year-on-year with the increase in data volume, leading to a more detailed characterization of the defaulted population. However, it presents new challenges and one of them is that all samples are separately stored in different machines and cannot be transferred directly for privacy considerations and limited data storage capacity. This paper develops an improved communication-efficient distributed algorithm in which more local summarized information is used to estimate the high-order derivatives of the loss function with lower communication cost. Furthermore, to protect the privacy in the interacted vector, we design a privacy-preserving algorithm based on the differential privacy constraint by adding a Laplace-distributed noise term in the parameters that can be extended to other cases beyond distributed architectures. Both non-private and private schemes, in which only local estimators are passed from the local machine to the central machine, are more theoretically and practically accurate and efficient than their counterparts. Then we suggest a bootstrap scheme to estimate the covariance matrix of the parametric estimators that is beneficial to effective inference. Finally, we find that the proposed method can effectively handle the practical activities that are, accurate probabilistic predictions of default risk and climate activity.
KW - Communication efficiency
KW - Differential data privacy
KW - Distributed algorithm
KW - Laplace mechanism
KW - M-estimation
UR - https://www.scopus.com/pages/publications/105021471663
U2 - 10.1016/j.jeconom.2025.106125
DO - 10.1016/j.jeconom.2025.106125
M3 - 文章
AN - SCOPUS:105021471663
SN - 0304-4076
VL - 253
JO - Journal of Econometrics
JF - Journal of Econometrics
M1 - 106125
ER -