Skip to main navigation Skip to search Skip to main content

AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding

  • Lingyan Zheng
  • , Shuiyang Shi
  • , Mingkun Lu
  • , Pan Fang
  • , Ziqi Pan
  • , Hongning Zhang
  • , Zhimeng Zhou
  • , Hanyu Zhang
  • , Minjie Mou
  • , Shijie Huang
  • , Lin Tao
  • , Weiqi Xia
  • , Honglin Li
  • , Zhenyu Zeng
  • , Shun Zhang
  • , Yuzong Chen
  • , Zhaorong Li*
  • , Feng Zhu*
  • *Corresponding author for this work
  • The Second Affiliated Hospital of Zhejiang University School of Medicine
  • Alibaba Group Holding Ltd.
  • Zhejiang University
  • Hangzhou Normal University
  • Zhejiang Provincial People's Hospital
  • East China University of Science and Technology
  • Tsinghua University

Research output: Contribution to journalArticlepeer-review

Abstract

Protein function annotation has been one of the longstanding issues in biological sciences, and various computational methods have been developed. However, the existing methods suffer from a serious long-tail problem, with a large number of GO families containing few annotated proteins. Herein, an innovative strategy named AnnoPRO was therefore constructed by enabling sequence-based multi-scale protein representation, dual-path protein encoding using pre-training, and function annotation by long short-term memory-based decoding. A variety of case studies based on different benchmarks were conducted, which confirmed the superior performance of AnnoPRO among available methods. Source code and models have been made freely available at: https://github.com/idrblab/AnnoPRO and https://zenodo.org/records/10012272

Original languageEnglish
Article number41
JournalGenome Biology
Volume25
Issue number1
DOIs
StatePublished - Dec 2024
Externally publishedYes

Keywords

  • LSTM
  • Long-tail problem
  • Pre-training
  • Protein function annotation
  • Protein representation

Fingerprint

Dive into the research topics of 'AnnoPRO: a strategy for protein function annotation based on multi-scale protein representation and a hybrid deep learning of dual-path encoding'. Together they form a unique fingerprint.

Cite this