Abstract
Current experimental and computational methods have limitations in accurately and efficiently classifying ion channels within vast protein spaces. Here we have developed a deep learning algorithm, GPT2 Ion Channel Classifier (GPT2-ICC), which effectively distinguishing ion channels from a test set containing approximately 239 times more non-ion-channel proteins. GPT2-ICC integrates representation learning with a large language model (LLM)-based classifier, enabling highly accurate identification of potential ion channels. Several potential ion channels were predicated from the unannotated human proteome, further demonstrating GPT2-ICC's generalization ability. This study marks a significant advancement in artificial-intelligence-driven ion channel research, highlighting the adaptability and effectiveness of combining representation learning with LLMs to address the challenges of imbalanced protein sequence data. Moreover, it provides a valuable computational tool for uncovering previously uncharacterized ion channels.
| Original language | English |
|---|---|
| Article number | 101302 |
| Journal | Journal of Pharmaceutical Analysis |
| Volume | 15 |
| Issue number | 8 |
| DOIs | |
| State | Published - Aug 2025 |
Keywords
- Artificial intelligence
- GPT2
- Ion channel
- Protein language model
- Representation learning