TY - JOUR
T1 - Leveraging Unlabeled Data for Superior ROC Curve Estimation via a Semiparametric Approach
AU - Zhang, Menghua
AU - Peng, Mengjiao
AU - Zhou, Yong
N1 - Publisher Copyright:
© 2025 American Statistical Association.
PY - 2025
Y1 - 2025
N2 - The receiver operating characteristic (ROC) curve is a widely used tool in various fields, including economics, medicine, and machine learning, for evaluating classification performance and comparing treatment effect. The absence of clear and readily labels is a frequent phenomenon in estimating ROC owing to various reasons like labeling cost, time constraints, data privacy and information asymmetry. Traditional supervised estimators commonly rely solely on labeled data, where each sample is associated with a fully observed response variable. We propose a new set of semi-supervised (SS) estimators to exploit available unlabeled data (samples lack of observations for responses) to enhance the estimation precision under the semi-parametric setting assuming that the distribution of the response variable for one group is known up to unknown parameters. The newly proposed SS estimators have attractive properties such as adaptability and efficiency by leveraging the flexibility of kernel smoothing method. We establish the large sample properties of the SS estimators, which demonstrate that the SS estimators outperform the supervised estimator consistently under mild assumptions. Numeric experiments provide empirical evidence to support our theoretical findings. Finally, we showcase the practical applicability of our proposed methodology by applying it to two real datasets.
AB - The receiver operating characteristic (ROC) curve is a widely used tool in various fields, including economics, medicine, and machine learning, for evaluating classification performance and comparing treatment effect. The absence of clear and readily labels is a frequent phenomenon in estimating ROC owing to various reasons like labeling cost, time constraints, data privacy and information asymmetry. Traditional supervised estimators commonly rely solely on labeled data, where each sample is associated with a fully observed response variable. We propose a new set of semi-supervised (SS) estimators to exploit available unlabeled data (samples lack of observations for responses) to enhance the estimation precision under the semi-parametric setting assuming that the distribution of the response variable for one group is known up to unknown parameters. The newly proposed SS estimators have attractive properties such as adaptability and efficiency by leveraging the flexibility of kernel smoothing method. We establish the large sample properties of the SS estimators, which demonstrate that the SS estimators outperform the supervised estimator consistently under mild assumptions. Numeric experiments provide empirical evidence to support our theoretical findings. Finally, we showcase the practical applicability of our proposed methodology by applying it to two real datasets.
KW - Adaptability
KW - Comparison problem
KW - ROC curve
KW - Semi-supervised learning
KW - Semiparametric efficiency
UR - https://www.scopus.com/pages/publications/86000248585
U2 - 10.1080/07350015.2025.2450495
DO - 10.1080/07350015.2025.2450495
M3 - 文章
AN - SCOPUS:86000248585
SN - 0735-0015
VL - 43
SP - 979
EP - 991
JO - Journal of Business and Economic Statistics
JF - Journal of Business and Economic Statistics
IS - 4
ER -