Skip to main navigation Skip to search Skip to main content

AutoEvolve: Automatically Evolving Queries for Applicable and Scalable Retrieval-Augmented Generation Benchmarking

  • Dingchu Zhang
  • , Xiaowen Zhang
  • , Yue Fei
  • , Renjun Hu
  • , Xiaowen Yang
  • , Zhi Zhou
  • , Baixuan Li
  • , Yufeng Li*
  • , Xing Shi
  • , Wei Lin
  • *Corresponding author for this work
  • Nanjing University
  • Alibaba Group Holding Ltd.

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Retrieval-augmented generation (RAG) enables large language models (LLMs) to address queries beyond their internal knowledge by integrating domain knowledge in specialized corpus, which necessitates the generation of benchmarks on specific corpus to evaluate RAG systems. However, existing automated generation methods exhibit Weak Applicability and Weak Scalability. Weak Applicability refers to the reliance on metadata from specific corpora for query generation, constraining applicability to other corpora. Weak Scalability is characterized by fixed query content after generation, unable to dynamically increase difficulty, limiting scalability of the query. To overcome these issues, we propose AutoEvolve, an applicable approach for dynamically evolving queries to construct scalable RAG benchmarks. Our approach is grounded in three key innovations: (i) a corpus-agnostic method for constructing the universal entity-document graph; (ii) a suite of evolution operations designed to dynamically update queries; and (iii) a difficulty-guided metric that directs query evolution process. Through experiments on three generated benchmarks, we demonstrate that AutoEvolve evolves queries that are significantly more challenging, paving the way for more applicable and scalable RAG evaluations.

Original languageEnglish
Title of host publicationEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025
EditorsChristos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
PublisherAssociation for Computational Linguistics (ACL)
Pages7624-7639
Number of pages16
ISBN (Electronic)9798891763357
DOIs
StatePublished - 2025
Event30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025 - Suzhou, China
Duration: 4 Nov 20259 Nov 2025

Publication series

NameEMNLP 2025 - 2025 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2025

Conference

Conference30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Country/TerritoryChina
CitySuzhou
Period4/11/259/11/25

Fingerprint

Dive into the research topics of 'AutoEvolve: Automatically Evolving Queries for Applicable and Scalable Retrieval-Augmented Generation Benchmarking'. Together they form a unique fingerprint.

Cite this