跳到主要导航 跳到搜索 跳到主要内容

MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues

  • Yunjia Zhang
  • , Junyi Zhu
  • , Rui Wang
  • , Tianai Zhuang
  • , Jinqiu Sang
  • , Ha Kyung Kim*
  • *此作品的通讯作者
  • Shanghai University of International Business and Economics
  • East China University of Science and Technology
  • East China Normal University
  • Beijing Language and Culture University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Current benchmarks for evaluating large language models (LLMs) in medicine primarily focus on static questionanswering, neglecting the dynamic, interactive nature of many clinical workflows. This is particularly evident in cognitive assessments, where nuanced dialogue and multimodal interpretation are critical, yet no specialized benchmark exists to evaluate these capabilities. To address this gap, we introduce MoCADialog, the first large-scale, multimodal benchmark featuring 5000 high-fidelity simulated dialogues for the Montreal Cognitive Assessment (MoCA). The benchmark includes tasks of increasing complexity: accurate scoring, cognitive profile generation, and clinical error attribution, allowing for a fine-grained evaluation across seven cognitive domains. Our comprehensive evaluation of state-of-the-art multimodal LLMs reveals significant performance disparities; while models excel at simple recall tasks, they consistently fail in domains requiring executive function and abstract reasoning. A novel, clinically-driven error analysis further indicates that these failures stem not from knowledge deficits, but from fundamental difficulties in interpreting nuanced cues and applying domain-specific reasoning. MoCA-Dialog provides a crucial tool for assessing the clinical readiness of LLMs and highlights that future progress depends on enhancing their core reasoning and interpretive abilities, not just expanding their knowledge base. We release our demo, and prompt examples at https://mocadialogue.github.io.

源语言英语
主期刊名Proceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
编辑Juan Liu, Jingshan Huang, Xiaowo Wang, Fa Zhang, Xiufen Zou, Tian Tian, Xiaohua Hu, Bin Hu, Yi Xiong
出版商Institute of Electrical and Electronics Engineers Inc.
6654-6659
页数6
ISBN(电子版)9798331515577
DOI
出版状态已出版 - 2025
已对外发布
活动2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025 - Wuhan, 中国
期限: 15 12月 202518 12月 2025

出版系列

姓名Proceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025

会议

会议2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
国家/地区中国
Wuhan
时期15/12/2518/12/25

指纹

探究 'MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues' 的科研主题。它们共同构成独一无二的指纹。

引用此