HalluScope: A Comprehensive Dataset for Evaluating Hallucination in Large Language Models Across Multiple Domains

  • Chen Zhao
  • , Biao Jie Zeng
  • , Kedi Chen
  • , Xin Lin*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This study presents HalluScope, a benchmark specifically developed for evaluating hallucinations in large language models. HalluScope comprises 800 adversarially designed questions spanning multiple domains, systematically categorized into selective, temporal, imitative, factual, and overconfidence hallucinations. The dataset was constructed through automated question generation with mutual supervision between models, enabling both generation and evaluation. The evaluation adopts a multiple-choice format, requiring models to select the correct answers from options containing multiple correct choices, thereby providing a more nuanced assessment of model confidence and judgment under uncertainty. Extensive experiments were conducted on 12 large language models, including ERNIE-Bot, ChatGLM, Qwen, and XVerse, with nine models exhibiting hallucination-free rates below 50%, underscoring the benchmark’s difficulty. Furthermore, HalluScope offers insights into hallucination-prone domains and hallucination types, providing guidance for fine-tuning models to mitigate hallucinations effectively.

Original languageEnglish
Title of host publicationAdvanced Intelligent Computing Technology and Applications - 21st International Conference, ICIC 2025, Proceedings
EditorsDe-Shuang Huang, Haiming Chen, Bo Li, Qinhu Zhang
PublisherSpringer Science and Business Media Deutschland GmbH
Pages466-477
Number of pages12
ISBN (Print)9789819699933
DOIs
StatePublished - 2025
Event21st International Conference on Intelligent Computing, ICIC 2025 - Ningbo, China
Duration: 26 Jul 202529 Jul 2025

Publication series

NameCommunications in Computer and Information Science
Volume2573 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference21st International Conference on Intelligent Computing, ICIC 2025
Country/TerritoryChina
CityNingbo
Period26/07/2529/07/25

Keywords

  • Component
  • Formatting
  • Insert
  • Style
  • Styling

Fingerprint

Dive into the research topics of 'HalluScope: A Comprehensive Dataset for Evaluating Hallucination in Large Language Models Across Multiple Domains'. Together they form a unique fingerprint.

Cite this