A pattern-based SVM for protein remote homology detection

  • Qi Wen Dong*
  • , Lei Lin
  • , Xiao Long Wang
  • , Ming Hui Li
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

16 Scopus citations

Abstract

One key element in understanding the molecular machinery of the cell is to understand the structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously un-annotated protein is via sequence homology with one or more protein whose structure or function is already known. In this paper, a novel method for protein remote homology detection has been presented. The technologies of text categorization from natural language processing have been used in protein classification. Patterns are discovered by TEIRESIAS algorithm and can be viewed as the "words" of "protein sequence language". The patterns are then filtered by an efficient feature selection algorithm called chi-square algorithm. Each protein sequence is mapped into a high dimensional vector by the occurrence times of the selected patterns. This presentation, combined with a discriminative classification algorithm known as the Support Vector Machine (SVM), provides a powerful means for protein remote homology detection. The method, called SVM-pattern, is tested on the SCOP database and compared with other state-of-the-art methods. The performance of SVM-pattern is better than that of BLAST method and comparable with other SVM-based methods such as SVM-k-spectrum and SVM-pairwise.

Original languageEnglish
Title of host publication2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005
Pages3363-3368
Number of pages6
StatePublished - 2005
Externally publishedYes
EventInternational Conference on Machine Learning and Cybernetics, ICMLC 2005 - Guangzhou, China
Duration: 18 Aug 200521 Aug 2005

Publication series

Name2005 International Conference on Machine Learning and Cybernetics, ICMLC 2005

Conference

ConferenceInternational Conference on Machine Learning and Cybernetics, ICMLC 2005
Country/TerritoryChina
CityGuangzhou
Period18/08/0521/08/05

Keywords

  • Pattern
  • Protein
  • Remote homology
  • Text categorization

Fingerprint

Dive into the research topics of 'A pattern-based SVM for protein remote homology detection'. Together they form a unique fingerprint.

Cite this