TY - JOUR
T1 - A semi-supervised deep learning method based on stacked sparse auto-encoder for cancer prediction using RNA-seq data
AU - Xiao, Yawen
AU - Wu, Jun
AU - Lin, Zongli
AU - Zhao, Xiaodong
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/11
Y1 - 2018/11
N2 - Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction.
AB - Background and objective: Cancer has become a complex health problem due to its high mortality. Over the past few decades, with the rapid development of the high-throughput sequencing technology and the application of various machine learning methods, remarkable progress in cancer research has been made based on gene expression data. At the same time, a growing amount of high-dimensional data has been generated, such as RNA-seq data, which calls for superior machine learning methods able to deal with mass data effectively in order to make accurate treatment decision. Methods: In this paper, we present a semi-supervised deep learning strategy, the stacked sparse auto-encoder (SSAE) based classification, for cancer prediction using RNA-seq data. The proposed SSAE based method employs the greedy layer-wise pre-training and a sparsity penalty term to help capture and extract important information from the high-dimensional data and then classify the samples. Results: We tested the proposed SSAE model on three public RNA-seq data sets of three types of cancers and compared the prediction performance with several commonly-used classification methods. The results indicate that our approach outperforms the other methods for all the three cancer data sets in various metrics. Conclusions: The proposed SSAE based semi-supervised deep learning model shows its promising ability to process high-dimensional gene expression data and is proved to be effective and accurate for cancer prediction.
KW - Cancer prediction
KW - Deep learning
KW - Gene expression data
KW - Semi-supervised learning
KW - Stacked sparse auto-encoder
UR - https://www.scopus.com/pages/publications/85054745104
U2 - 10.1016/j.cmpb.2018.10.004
DO - 10.1016/j.cmpb.2018.10.004
M3 - 文章
C2 - 30415723
AN - SCOPUS:85054745104
SN - 0169-2607
VL - 166
SP - 99
EP - 105
JO - Computer Methods and Programs in Biomedicine
JF - Computer Methods and Programs in Biomedicine
ER -