TY - JOUR
T1 - Correlation-based joint feature screening for semi-competing risks outcomes with application to breast cancer data
AU - Peng, Mengjiao
AU - Xiang, Liming
N1 - Publisher Copyright:
© The Author(s) 2021.
PY - 2021/11
Y1 - 2021/11
N2 - Ultrahigh-dimensional gene features are often collected in modern cancer studies in which the number of gene features (Formula presented.) is extremely larger than sample size (Formula presented.). While gene expression patterns have been shown to be related to patients’ survival in microarray-based gene expression studies, one has to deal with the challenges of ultrahigh-dimensional genetic predictors for survival predicting and genetic understanding of the disease in precision medicine. The problem becomes more complicated when two types of survival endpoints, distant metastasis-free survival and overall survival, are of interest in the study and outcome data can be subject to semi-competing risks due to the fact that distant metastasis-free survival is possibly censored by overall survival but not vice versa. Our focus in this paper is to extract important features, which have great impacts on both distant metastasis-free survival and overall survival jointly, from massive gene expression data in the semi-competing risks setting. We propose a model-free screening method based on the ranking of the correlation between gene features and the joint survival function of two endpoints. The method accounts for the relationship between two endpoints in a simply defined utility measure that is easy to understand and calculate. We show its favorable theoretical properties such as the sure screening and ranking consistency, and evaluate its finite sample performance through extensive simulation studies. Finally, an application to classifying breast cancer data clearly demonstrates the utility of the proposed method in practice.
AB - Ultrahigh-dimensional gene features are often collected in modern cancer studies in which the number of gene features (Formula presented.) is extremely larger than sample size (Formula presented.). While gene expression patterns have been shown to be related to patients’ survival in microarray-based gene expression studies, one has to deal with the challenges of ultrahigh-dimensional genetic predictors for survival predicting and genetic understanding of the disease in precision medicine. The problem becomes more complicated when two types of survival endpoints, distant metastasis-free survival and overall survival, are of interest in the study and outcome data can be subject to semi-competing risks due to the fact that distant metastasis-free survival is possibly censored by overall survival but not vice versa. Our focus in this paper is to extract important features, which have great impacts on both distant metastasis-free survival and overall survival jointly, from massive gene expression data in the semi-competing risks setting. We propose a model-free screening method based on the ranking of the correlation between gene features and the joint survival function of two endpoints. The method accounts for the relationship between two endpoints in a simply defined utility measure that is easy to understand and calculate. We show its favorable theoretical properties such as the sure screening and ranking consistency, and evaluate its finite sample performance through extensive simulation studies. Finally, an application to classifying breast cancer data clearly demonstrates the utility of the proposed method in practice.
KW - Gene expression data
KW - joint survival function
KW - nonparametric estimation
KW - semi-competing risks
KW - ultrahigh-dimensionality
UR - https://www.scopus.com/pages/publications/85114854933
U2 - 10.1177/09622802211037071
DO - 10.1177/09622802211037071
M3 - 文章
C2 - 34519231
AN - SCOPUS:85114854933
SN - 0962-2802
VL - 30
SP - 2428
EP - 2446
JO - Statistical Methods in Medical Research
JF - Statistical Methods in Medical Research
IS - 11
ER -