TY - JOUR
T1 - PFmulDL
T2 - a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods
AU - Xia, Weiqi
AU - Zheng, Lingyan
AU - Fang, Jiebin
AU - Li, Fengcheng
AU - Zhou, Ying
AU - Zeng, Zhenyu
AU - Zhang, Bing
AU - Li, Zhaorong
AU - Li, Honglin
AU - Zhu, Feng
N1 - Publisher Copyright:
© 2022
PY - 2022/6
Y1 - 2022/6
N2 - Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the ‘rare classes’ remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the ‘rare classes’ without sacrificing that for the ‘major classes’. All in all, due to the emerging requirements on improving the prediction performance for the proteins in ‘rare classes’, this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.
AB - Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the ‘rare classes’ remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the ‘rare classes’ without sacrificing that for the ‘major classes’. All in all, due to the emerging requirements on improving the prediction performance for the proteins in ‘rare classes’, this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.
KW - Convolutional neural network
KW - Deep learning
KW - Gene ontology
KW - Protein function prediction
KW - Recurrent neural network
UR - https://www.scopus.com/pages/publications/85127327692
U2 - 10.1016/j.compbiomed.2022.105465
DO - 10.1016/j.compbiomed.2022.105465
M3 - 文章
C2 - 35366467
AN - SCOPUS:85127327692
SN - 0010-4825
VL - 145
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 105465
ER -