TY - GEN
T1 - Improving prediction of the contact numbers of residues in proteins from primary sequences
AU - Dong, Qiwen
AU - Zhou, Shuigeng
AU - Guan, Jihong
PY - 2009
Y1 - 2009
N2 - Contact number is one kinds of one-dimensional features of proteins. Knowing the number of residue contacts in a protein is crucial to derive constraints useful in protein structure prediction. In this study, we evaluate and compare several methods and different features for contact number prediction. The experiments are performed on a non-redundant dataset containing 1109 proteins. The contact number prediction is formulated as a multi-class classification problem. Three-fold cross validation is used to get the performance of various methods with different combinations of features as input. The experimental results show that the profile feature containing evolutionary information of proteins can achieve better performance than simple amino acid sequences. Further performance improvement is achieved by including the predicted secondary structure and relative solvent accessibility as additional features. In all experiments, each tested method can improve the performance by more than 10 percent in comparison with the base-line method. The best Q score for two-class classification is 79.7%, which is higher than the best results reported in the literature by 2 percent. The results obtained here can provide valuable information for protein structure reconstruction, model quality assessment, etc.
AB - Contact number is one kinds of one-dimensional features of proteins. Knowing the number of residue contacts in a protein is crucial to derive constraints useful in protein structure prediction. In this study, we evaluate and compare several methods and different features for contact number prediction. The experiments are performed on a non-redundant dataset containing 1109 proteins. The contact number prediction is formulated as a multi-class classification problem. Three-fold cross validation is used to get the performance of various methods with different combinations of features as input. The experimental results show that the profile feature containing evolutionary information of proteins can achieve better performance than simple amino acid sequences. Further performance improvement is achieved by including the predicted secondary structure and relative solvent accessibility as additional features. In all experiments, each tested method can improve the performance by more than 10 percent in comparison with the base-line method. The best Q score for two-class classification is 79.7%, which is higher than the best results reported in the literature by 2 percent. The results obtained here can provide valuable information for protein structure reconstruction, model quality assessment, etc.
KW - Conditional random field
KW - Contact number prediction
KW - Maximum entropy model
KW - Support vector machine
UR - https://www.scopus.com/pages/publications/70450180542
U2 - 10.1109/IJCBS.2009.39
DO - 10.1109/IJCBS.2009.39
M3 - 会议稿件
AN - SCOPUS:70450180542
SN - 9780769537399
T3 - Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
SP - 251
EP - 254
BT - Proceedings - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
T2 - 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing, IJCBS 2009
Y2 - 3 August 2009 through 5 August 2009
ER -