Skip to main navigation Skip to search Skip to main content

Protein remote homology detection based on binary profiles

  • Qiwen Dong*
  • , Lei Lin
  • , Xiaolong Wang
  • *Corresponding author for this work
  • Harbin Inst. of Technol.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Remote homology detection is a key element of protein structure and function analysis in computational and experimental biology. This paper presents a simple representation of protein sequences, which uses the evolutionary information of profiles for efficient remote homology detection. The frequency profiles are directly calculated from the multiple sequence alignments outputted by PSI-BLAST and converted into binary profiles with a probability threshold. Such binary profiles make up of a new building block for protein sequences. The protein sequences are mapped into high-dimensional vectors by the occurrence times of each binary profile. The resulting vectors are then evaluated by support vector machine to train classifiers that are then used to classify the test protein sequences. The method is further improved by applying an efficient feature extraction algorithm from natural language processing, namely, the latent semantic analysis model. Testing on the SCOP 1.53 database shows that the method based on binary profiles outperforms those based on many other basic building blocks including N-grams, patters and motifs. The ROC50 score is 0.698, which is higher than other methods by nearly 10 percent.

Original languageEnglish
Title of host publicationBioinformatics Research and Development - First International Conference, BIRD 2007 Proceedings
PublisherSpringer Verlag
Pages212-223
Number of pages12
ISBN (Print)3540712321, 9783540712329
DOIs
StatePublished - 2007
Externally publishedYes
Event1st International Conference on Bioinformatics Research and Development, BIRD 2007 - Berlin, Germany
Duration: 12 Mar 200714 Mar 2007

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4414 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference1st International Conference on Bioinformatics Research and Development, BIRD 2007
Country/TerritoryGermany
CityBerlin
Period12/03/0714/03/07

Keywords

  • Binary profile
  • Latent semantic analysis
  • Remote homology detection

Fingerprint

Dive into the research topics of 'Protein remote homology detection based on binary profiles'. Together they form a unique fingerprint.

Cite this