跳到主要导航 跳到搜索 跳到主要内容

ACE-M3: Automatic Capability Evaluator for Multimodal Medical Models

  • East China Normal University
  • University of Potsdam
  • Tongji University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

As multimodal large language models (MLLMs) gain prominence in the medical field, the need for precise evaluation methods to assess their effectiveness has become critical. While benchmarks provide a reliable means to evaluate the capabilities of MLLMs, traditional metrics like ROUGE and BLEU employed for open domain evaluation only focus on token overlap and may not align with human judgment. Although human evaluation is more reliable, it is labor-intensive, costly, and not scalable. LLM-based evaluation methods have proven promising, but to date, there is still an urgent need for open-source multimodal LLM-based evaluators in the medical field. To address this issue, we introduce ACE-M3, an open-sourced Automatic Capability Evaluator for Multimodal Medical Models specifically designed to assess the question answering abilities of medical MLLMs. It first utilizes a branch-merge architecture to provide both detailed analysis and a concise final score based on standard medical evaluation criteria. Subsequently, a reward token-based direct preference optimization (RTDPO) strategy is incorporated to save training time without compromising performance of our model. Extensive experiments have demonstrated the effectiveness of our ACE-M3 model in evaluating the capabilities of medical MLLMs.

源语言英语
主期刊名Main Conference
编辑Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
出版商Association for Computational Linguistics (ACL)
4030-4054
页数25
ISBN(电子版)9798891761964
出版状态已出版 - 2025
活动31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, 阿拉伯联合酋长国
期限: 19 1月 202524 1月 2025

出版系列

姓名Proceedings - International Conference on Computational Linguistics, COLING
ISSN(印刷版)2951-2093

会议

会议31st International Conference on Computational Linguistics, COLING 2025
国家/地区阿拉伯联合酋长国
Abu Dhabi
时期19/01/2524/01/25

指纹

探究 'ACE-M3: Automatic Capability Evaluator for Multimodal Medical Models' 的科研主题。它们共同构成独一无二的指纹。

引用此