Skip to main navigation Skip to search Skip to main content

MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues

  • Yunjia Zhang
  • , Junyi Zhu
  • , Rui Wang
  • , Tianai Zhuang
  • , Jinqiu Sang
  • , Ha Kyung Kim*
  • *Corresponding author for this work
  • Shanghai University of International Business and Economics
  • East China University of Science and Technology
  • East China Normal University
  • Beijing Language and Culture University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Current benchmarks for evaluating large language models (LLMs) in medicine primarily focus on static questionanswering, neglecting the dynamic, interactive nature of many clinical workflows. This is particularly evident in cognitive assessments, where nuanced dialogue and multimodal interpretation are critical, yet no specialized benchmark exists to evaluate these capabilities. To address this gap, we introduce MoCADialog, the first large-scale, multimodal benchmark featuring 5000 high-fidelity simulated dialogues for the Montreal Cognitive Assessment (MoCA). The benchmark includes tasks of increasing complexity: accurate scoring, cognitive profile generation, and clinical error attribution, allowing for a fine-grained evaluation across seven cognitive domains. Our comprehensive evaluation of state-of-the-art multimodal LLMs reveals significant performance disparities; while models excel at simple recall tasks, they consistently fail in domains requiring executive function and abstract reasoning. A novel, clinically-driven error analysis further indicates that these failures stem not from knowledge deficits, but from fundamental difficulties in interpreting nuanced cues and applying domain-specific reasoning. MoCA-Dialog provides a crucial tool for assessing the clinical readiness of LLMs and highlights that future progress depends on enhancing their core reasoning and interpretive abilities, not just expanding their knowledge base. We release our demo, and prompt examples at https://mocadialogue.github.io.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
EditorsJuan Liu, Jingshan Huang, Xiaowo Wang, Fa Zhang, Xiufen Zou, Tian Tian, Xiaohua Hu, Bin Hu, Yi Xiong
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6654-6659
Number of pages6
ISBN (Electronic)9798331515577
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025 - Wuhan, China
Duration: 15 Dec 202518 Dec 2025

Publication series

NameProceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025

Conference

Conference2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
Country/TerritoryChina
CityWuhan
Period15/12/2518/12/25

Keywords

  • cognitive assessment
  • large language model
  • medical benchmark
  • MoCA

Fingerprint

Dive into the research topics of 'MoCA-Dialog: A Benchmark for Fine-Grained Evaluation of Large Language Models in Clinical Cognitive Assessment Dialogues'. Together they form a unique fingerprint.

Cite this