TY - JOUR
T1 - Large Language Model-Based Natural Language Encoding Could Be All You Need for Drug Biomedical Association Prediction
AU - Zhang, Hanyu
AU - Zhou, Yuan
AU - Zhang, Zhichao
AU - Sun, Huaicheng
AU - Pan, Ziqi
AU - Mou, Minjie
AU - Zhang, Wei
AU - Ye, Qing
AU - Hou, Tingjun
AU - Li, Honglin
AU - Hsieh, Chang Yu
AU - Zhu, Feng
N1 - Publisher Copyright:
© 2024 American Chemical Society.
PY - 2024
Y1 - 2024
N2 - Analyzing drug-related interactions in the field of biomedicine has been a critical aspect of drug discovery and development. While various artificial intelligence (AI)-based tools have been proposed to analyze drug biomedical associations (DBAs), their feature encoding did not adequately account for crucial biomedical functions and semantic concepts, thereby still hindering their progress. Since the advent of ChatGPT by OpenAI in 2022, large language models (LLMs) have demonstrated rapid growth and significant success across various applications. Herein, LEDAP was introduced, which uniquely leveraged LLM-based biotext feature encoding for predicting drug-disease associations, drug-drug interactions, and drug-side effect associations. Benefiting from the large-scale knowledgebase pre-training, LLMs had great potential in drug development analysis owing to their holistic understanding of natural language and human topics. LEDAP illustrated its notable competitiveness in comparison with other popular DBA analysis tools. Specifically, even in simple conjunction with classical machine learning methods, LLM-based feature representations consistently enabled satisfactory performance across diverse DBA tasks like binary classification, multiclass classification, and regression. Our findings underpinned the considerable potential of LLMs in drug development research, indicating a catalyst for further progress in related fields.
AB - Analyzing drug-related interactions in the field of biomedicine has been a critical aspect of drug discovery and development. While various artificial intelligence (AI)-based tools have been proposed to analyze drug biomedical associations (DBAs), their feature encoding did not adequately account for crucial biomedical functions and semantic concepts, thereby still hindering their progress. Since the advent of ChatGPT by OpenAI in 2022, large language models (LLMs) have demonstrated rapid growth and significant success across various applications. Herein, LEDAP was introduced, which uniquely leveraged LLM-based biotext feature encoding for predicting drug-disease associations, drug-drug interactions, and drug-side effect associations. Benefiting from the large-scale knowledgebase pre-training, LLMs had great potential in drug development analysis owing to their holistic understanding of natural language and human topics. LEDAP illustrated its notable competitiveness in comparison with other popular DBA analysis tools. Specifically, even in simple conjunction with classical machine learning methods, LLM-based feature representations consistently enabled satisfactory performance across diverse DBA tasks like binary classification, multiclass classification, and regression. Our findings underpinned the considerable potential of LLMs in drug development research, indicating a catalyst for further progress in related fields.
UR - https://www.scopus.com/pages/publications/85198997774
U2 - 10.1021/acs.analchem.4c01793
DO - 10.1021/acs.analchem.4c01793
M3 - 文章
AN - SCOPUS:85198997774
SN - 0003-2700
JO - Analytical Chemistry
JF - Analytical Chemistry
ER -