TY - JOUR
T1 - GCFA
T2 - Generative class feature fusion with agent attention for medical text classification
AU - Wang, Ye
AU - Wang, Qingyan
AU - Yu, Hong
AU - Xie, Jiang
AU - Hu, Feng
AU - Wang, Xiaoling
AU - Lei, Dajiang
N1 - Publisher Copyright:
© 2025 Elsevier B.V.
PY - 2026/2
Y1 - 2026/2
N2 - Recent advances in Generative Artificial Intelligence (GenAI), particularly large language models (LLMs), have introduced novel paradigms for long-tailed medical text classification. This task is challenging because tailed classes suffer from severe data scarcity while requiring a comprehensive understanding of domain-specific medical information. To this end, we propose a Generative Class feature Fusion with Agent attention (GCFA) model, which leverages LLM-driven data generation and information fusion to enhance feature representations and mitigate data imbalance. Specifically, a generative head-tailed fusion strategy is proposed, which generates tailed samples by strategically fusing semantically diverse features from both head and tailed distributions. This ensures that generated samples retain tail-class identity while enriching their semantic diversity. Then, we design a prompt-based medical terminology learning method, where LLMs can mine critical, especially some low-frequency medical terms, from three public datasets to construct a medical vocabulary dictionary. This dictionary guides our Medical Agent Attention Mechanism, enabling targeted emphasis on important medical terms. Extensive experiments demonstrate that GCFA achieves state-of-the-art performance across all evaluated datasets. Our code is available: https://github.com/WQYwqy123456/GCFA-123#.
AB - Recent advances in Generative Artificial Intelligence (GenAI), particularly large language models (LLMs), have introduced novel paradigms for long-tailed medical text classification. This task is challenging because tailed classes suffer from severe data scarcity while requiring a comprehensive understanding of domain-specific medical information. To this end, we propose a Generative Class feature Fusion with Agent attention (GCFA) model, which leverages LLM-driven data generation and information fusion to enhance feature representations and mitigate data imbalance. Specifically, a generative head-tailed fusion strategy is proposed, which generates tailed samples by strategically fusing semantically diverse features from both head and tailed distributions. This ensures that generated samples retain tail-class identity while enriching their semantic diversity. Then, we design a prompt-based medical terminology learning method, where LLMs can mine critical, especially some low-frequency medical terms, from three public datasets to construct a medical vocabulary dictionary. This dictionary guides our Medical Agent Attention Mechanism, enabling targeted emphasis on important medical terms. Extensive experiments demonstrate that GCFA achieves state-of-the-art performance across all evaluated datasets. Our code is available: https://github.com/WQYwqy123456/GCFA-123#.
KW - Generative artificial intelligence
KW - Medical agent attention
KW - Medical text classification
UR - https://www.scopus.com/pages/publications/105014730513
U2 - 10.1016/j.inffus.2025.103639
DO - 10.1016/j.inffus.2025.103639
M3 - 文章
AN - SCOPUS:105014730513
SN - 1566-2535
VL - 126
JO - Information Fusion
JF - Information Fusion
M1 - 103639
ER -