HiFun: Homology independent protein function prediction by a novel protein-language self-Attention model

  • Jun Wu
  • , Haipeng Qing
  • , Jian Ouyang
  • , Jiajia Zhou
  • , Zihao Gao
  • , Christopher E. Mason
  • , Zhichao Liu
  • , Tieliu Shi*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Protein function prediction based on amino acid sequence alone is an extremely challenging but important task, especially in metagenomics/metatranscriptomics field, in which novel proteins have been uncovered exponentially from new microorganisms. Many of them are extremely low homology to known proteins and cannot be annotated with homology-based or information integrative methods. To overcome this problem, we proposed a Homology Independent protein Function annotation method (HiFun) based on a unified deep-learning model by reassembling the sequence as protein language. The robustness of HiFun was evaluated using the benchmark datasets and metrics in the CAFA3 challenge. To navigate the utility of HiFun, we annotated 2 212 663 unknown proteins and discovered novel motifs in the UHGP-50 catalog. We proved that HiFun can extract latent function related structure features which empowers it ability to achieve function annotation for non-homology proteins. HiFun can substantially improve newly proteins annotation and expand our understanding of microorganisms' adaptation in various ecological niches. Moreover, we provided a free and accessible webservice at http://www.unimd.org/HiFun, requiring only protein sequences as input, offering researchers an efficient and practical platform for predicting protein functions.

Original languageEnglish
Article numberbbad311
JournalBriefings in Bioinformatics
Volume24
Issue number5
DOIs
StatePublished - 1 Sep 2023

Keywords

  • deep-learning
  • homology-independent
  • metagenome
  • protein function prediction
  • protein structure
  • self-Attention

Fingerprint

Dive into the research topics of 'HiFun: Homology independent protein function prediction by a novel protein-language self-Attention model'. Together they form a unique fingerprint.

Cite this