A seqlet-based maximum entropy Markov approach for protein secondary structure prediction

Qiwen Dong, Xiaolong Wang, Lei Lin, Yi Guan

Research output: Contribution to journalArticlepeer-review

8 Scopus citations

Abstract

A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html. Copyright by Science in China Press 2005.

Original languageEnglish
Pages (from-to)394-405
Number of pages12
JournalScience in China, Series C: Life Sciences
Volume48
Issue number4
DOIs
StatePublished - Aug 2005
Externally publishedYes

Keywords

  • Maximum entropy Markov model
  • Protein secondary structure prediction
  • Protein secondary structure seqlets
  • Word-lattice

Fingerprint

Dive into the research topics of 'A seqlet-based maximum entropy Markov approach for protein secondary structure prediction'. Together they form a unique fingerprint.

Cite this