TY - JOUR
T1 - Semi-supervised learning for various comparison functions across two populations
AU - Zhang, Menghua
AU - Peng, Mengjiao
AU - Zhou, Yong
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
PY - 2025/2
Y1 - 2025/2
N2 - Estimating comparison functions is crucial in numerous domains, such as econometrics, clinical medicine, and public health, where evaluating the effectiveness of interventions or treatment effects is a central concern. While the response variables are much more expensive to collect than the covariates in many scenarios, to tackle the challenge of limited labeled data, we present a unified semi-supervised learning (SSL) framework to estimate comparison functions, like the difference between two independent samples in means, probabilities for events, the survival competition probability, by leveraging the information of unlabelled data with only covariate observations to improve estimation accuracy. Specifically, a class of efficient and adaptive estimators for comparison functions is proposed to effectively utilize both the labeled data and unlabelled data under the semi-supervised (SS) framework. We establish the consistency and asymptotic normality of the proposed estimators and provide the optimal weight yielding the most efficient estimator. Furthermore, the resulting estimator is shown to be semiparametric efficient if the working model is correctly specified. Extensive numerical simulations are conducted to confirm the consistency and efficiency of our proposed estimators. An application to a real data extracted from the 2001 Medical Expenditures Panel Survey (MEPS) is also included.
AB - Estimating comparison functions is crucial in numerous domains, such as econometrics, clinical medicine, and public health, where evaluating the effectiveness of interventions or treatment effects is a central concern. While the response variables are much more expensive to collect than the covariates in many scenarios, to tackle the challenge of limited labeled data, we present a unified semi-supervised learning (SSL) framework to estimate comparison functions, like the difference between two independent samples in means, probabilities for events, the survival competition probability, by leveraging the information of unlabelled data with only covariate observations to improve estimation accuracy. Specifically, a class of efficient and adaptive estimators for comparison functions is proposed to effectively utilize both the labeled data and unlabelled data under the semi-supervised (SS) framework. We establish the consistency and asymptotic normality of the proposed estimators and provide the optimal weight yielding the most efficient estimator. Furthermore, the resulting estimator is shown to be semiparametric efficient if the working model is correctly specified. Extensive numerical simulations are conducted to confirm the consistency and efficiency of our proposed estimators. An application to a real data extracted from the 2001 Medical Expenditures Panel Survey (MEPS) is also included.
KW - Adaptivity
KW - Comparison functions
KW - Model-free estimator
KW - Semi-supervised learning
KW - Semiparametric efficiency
UR - https://www.scopus.com/pages/publications/85213064208
U2 - 10.1007/s00362-024-01632-3
DO - 10.1007/s00362-024-01632-3
M3 - 文章
AN - SCOPUS:85213064208
SN - 0932-5026
VL - 66
JO - Statistical Papers
JF - Statistical Papers
IS - 1
M1 - 18
ER -