跳到主要导航 跳到搜索 跳到主要内容

HalluScope: A Comprehensive Dataset for Evaluating Hallucination in Large Language Models Across Multiple Domains

  • Chen Zhao
  • , Biao Jie Zeng
  • , Kedi Chen
  • , Xin Lin*
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

This study presents HalluScope, a benchmark specifically developed for evaluating hallucinations in large language models. HalluScope comprises 800 adversarially designed questions spanning multiple domains, systematically categorized into selective, temporal, imitative, factual, and overconfidence hallucinations. The dataset was constructed through automated question generation with mutual supervision between models, enabling both generation and evaluation. The evaluation adopts a multiple-choice format, requiring models to select the correct answers from options containing multiple correct choices, thereby providing a more nuanced assessment of model confidence and judgment under uncertainty. Extensive experiments were conducted on 12 large language models, including ERNIE-Bot, ChatGLM, Qwen, and XVerse, with nine models exhibiting hallucination-free rates below 50%, underscoring the benchmark’s difficulty. Furthermore, HalluScope offers insights into hallucination-prone domains and hallucination types, providing guidance for fine-tuning models to mitigate hallucinations effectively.

源语言英语
主期刊名Advanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
编辑De-Shuang Huang, Haiming Chen, Bo Li, Qinhu Zhang
出版商Springer Science and Business Media Deutschland GmbH
466-477
页数12
ISBN(印刷版)9789819699933
DOI
出版状态已出版 - 2025
活动21st International Conference on Intelligent Computing, ICIC 2025 - Ningbo, 中国
期限: 26 7月 202529 7月 2025

出版系列

姓名Communications in Computer and Information Science
2573 CCIS
ISSN(印刷版)1865-0929
ISSN(电子版)1865-0937

会议

会议21st International Conference on Intelligent Computing, ICIC 2025
国家/地区中国
Ningbo
时期26/07/2529/07/25

指纹

探究 'HalluScope: A Comprehensive Dataset for Evaluating Hallucination in Large Language Models Across Multiple Domains' 的科研主题。它们共同构成独一无二的指纹。

引用此