Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment

Xiaoyang Jing, Qiwen Dong, Daocheng Hong, Ruqian Lu

Research output: Contribution to journalArticlepeer-review

75 Scopus citations

Abstract

As the first step of machine-learning based protein structure and function prediction, the amino acid encoding play a fundamental role in the final success of those methods. Different from the protein sequence encoding, the amino acid encoding can be used in both residue-level and sequence-level prediction of protein properties by combining them with different algorithms. However, it has not attracted enough attention in the past decades, and there are no comprehensive reviews and assessments about encoding methods so far. In this article, we make a systematic classification and propose a comprehensive review and assessment for various amino acid encoding methods. Those methods are grouped into five categories according to their information sources and information extraction methodologies, including binary encoding, physicochemical properties encoding, evolution-based encoding, structure-based encoding, and machine-learning encoding. Then, 16 representative methods from five categories are selected and compared on protein secondary structure prediction and protein fold recognition tasks by using large-scale benchmark datasets. The results show that the evolution-based position-dependent encoding method PSSM achieved the best performance, and the structure-based and machine-learning encoding methods also show some potential for further application, the neural network based distributed representation of amino acids in particular may bring new light to this area. We hope that the review and assessment are useful for future studies in amino acid encoding.

Original languageEnglish
Article number8692651
Pages (from-to)1918-1931
Number of pages14
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume17
Issue number6
DOIs
StatePublished - 1 Nov 2020

Keywords

  • Amino acid encoding
  • feature extraction
  • protein fold recognition
  • protein secondary structure prediction
  • protein structure and function prediction
  • residue encoding

Fingerprint

Dive into the research topics of 'Amino Acid Encoding Methods for Protein Sequences: A Comprehensive Review and Assessment'. Together they form a unique fingerprint.

Cite this