Abstract
With the advent of models such as ChatGPT and other models, large language models (LLMs) have demonstrated unprecedented capabilities in understanding and generating natural language, presenting novel opportunities and challenges within the medicine domain. While there have been many studies focusing on the employment of LLMs in medicine, comprehensive reviews of the datasets utilized in this field remain scarce. This survey seeks to address this gap by providing a comprehensive overview of the datasets in medicine fueling LLMs, highlighting their unique characteristics and the critical roles they play at different stages of LLMs’ development: pre-training, fine-tuning, and evaluation. Ultimately, this survey aims to underline the significance of datasets in realizing the full potential of LLMs to innovate and improve healthcare outcomes.
| Original language | English |
|---|---|
| Pages (from-to) | 457-478 |
| Number of pages | 22 |
| Journal | Intelligence and Robotics |
| Volume | 4 |
| Issue number | 4 |
| DOIs | |
| State | Published - Dec 2024 |
Keywords
- Large language models (LLMs)
- NLP
- Q&A system in medicine
- dataset in medicine
Fingerprint
Dive into the research topics of 'A survey of datasets in medicine for large language models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver