Chinese Elderly Healthcare-Oriented Conversation: CareQA Dataset and Its Knowledge Distillation Based Generation Framework

  • He Xiao
  • , Xingjiao Wu
  • , Jialiang Tong
  • , Bangyan Li
  • , Yuling Sun*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The increasing global aging brings the substantial demand for healthcare knowledge among the elderly. Large Language Models (LLMs) based Conversation Agents (CAs) hold significant promise for addressing the elderly's healthcare knowledge inquiries. Yet, general LLMs often fall short in providing professional and practically usable healthcare conversations due to the lack of specific knowledge, possible hallucination issues and contextual comprehension biases. To address these challenges, we first propose a cost-effective, domain-specific questioning-answering (QA) generation framework based on knowledge distillation (KD). Based on this framework, we then built CareQA, the first Chinese healthcare QA dataset specifically for the elderly, with 41,694 QA pairs spanning geriatric diseases covering multiple categories. A comprehensive benchmarking experiment, including both automated and human evaluation, is conducted to examine the usability of CareQA. The results demonstrate that the LLMs fine-tuned on CareQA perform better in answering elderly healthcare-related questions.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
EditorsMario Cannataro, Huiru Zheng, Lin Gao, Jianlin Cheng, Joao Luis de Miranda, Ester Zumpano, Xiaohua Hu, Young-Rae Cho, Taesung Park
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3866-3871
Number of pages6
ISBN (Electronic)9798350386226
DOIs
StatePublished - 2024
Externally publishedYes
Event2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024 - Lisbon, Portugal
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024

Conference

Conference2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
Country/TerritoryPortugal
CityLisbon
Period3/12/246/12/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. Good health and well being
    Good health and well being

Keywords

  • Elderly Healthcare
  • Knowledge Distillation
  • Large Language Model
  • QA Pairs Generation

Fingerprint

Dive into the research topics of 'Chinese Elderly Healthcare-Oriented Conversation: CareQA Dataset and Its Knowledge Distillation Based Generation Framework'. Together they form a unique fingerprint.

Cite this