GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models

  • Zihan Zhou
  • , Yang Yu
  • , Chengji Yang
  • , Leyan Cao
  • , Shaoying Zhang
  • , Junnan Li
  • , Yingnan Zhang
  • , Huayun Han
  • , Guoliang Shi
  • , Qiansen Zhang
  • , Juwen Shen*
  • , Huaiyu Yang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC's generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.

Original languageEnglish
Article number101302
JournalJournal of Pharmaceutical Analysis
Volume15
Issue number8
DOIs
StatePublished - Aug 2025

Keywords

  • Artificial intelligence
  • GPT2
  • Ion channel
  • Protein language model
  • Representation learning

Fingerprint

Dive into the research topics of 'GPT2-ICC: A data-driven approach for accurate ion channel identification using pre-trained large language models'. Together they form a unique fingerprint.

Cite this