Paper-Level Computerized Adaptive Testing for High-Stakes Examination via Multi-Objective Optimization

  • Mingjia Li
  • , Junkai Tong
  • , Yiyang Huang
  • , Yifei Ding
  • , Hong Qian*
  • , Aimin Zhou
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Computerized Adaptive Testing (CAT) is a testing technique that accurately infers students' proficiency levels using a relatively small number of questions. Most existing CAT systems operate on a question-level adaptive paradigm, which is suitable for practice scenarios. However, in computerized standardized high-stakes examinations such as the GRE and GMAT, this paradigm faces several challenges: (1) the lack of comparability in exam results(2) high implementation costs due to the reliance on real-time interactions and the financial burden of maintaining CAT testing system, and (3) the difficulty in balancing multiple factors of diagnosis quality, attribute coverage, and question exposure. To address these challenges, we propose a Paper-level Computerized Adaptive Testing (PCAT) and its corresponding evaluation method. PCAT divides an exam into multiple testing stages, where examinees adaptively receive test papers of varying difficulty based on their performance in previous stages. The paper assembly problem in PCAT is solved using a population-based multi-objective optimization (MOO) approach. PCAT offers several advantages: First, the paper-level adaptive mechanism ensures that the questions faced by examinees depend solely on their performance in the earlier stages, maintaining adaptability while enhancing the comparability of results across different examinees. Second, PCAT replaces the selection strategy module in traditional CAT with an assembly module, allowing computationally intensive tasks such as cognitive diagnosis and paper assembly to be completed offline before the exam, eliminating the need for real-time interactions. Additionally, the population-based MOO approach generates a set of high-quality solutions in one run, meeting the demands of frequent administration of standardized high-stakes exams like the GRE and reducing the financial burden of maintaining a large-scale CAT system. Finally, MOO naturally models multiple factors as separate objectives, enabling a balanced consideration of these factors and allowing exam administrators to customize the exam based on specific needs. Extensive experiments on four real-world datasets show that PCAT outperforms state-of-the-art (SOTA) CAT methods in terms of diagnosis quality, attribute coverage, and question exposure, while maintaining the same number of questions answered by examinees. These results highlight PCAT's potential in high-stakes examination settings.

Original languageEnglish
Title of host publicationKDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages1435-1446
Number of pages12
ISBN (Electronic)9798400714542
DOIs
StatePublished - 3 Aug 2025
Event31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025 - Toronto, Canada
Duration: 3 Aug 20257 Aug 2025

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume2
ISSN (Print)2154-817X

Conference

Conference31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Country/TerritoryCanada
CityToronto
Period3/08/257/08/25

Keywords

  • computerized adaptive testing
  • high-stakes examination
  • multi-objective optimization
  • paper assembly

Fingerprint

Dive into the research topics of 'Paper-Level Computerized Adaptive Testing for High-Stakes Examination via Multi-Objective Optimization'. Together they form a unique fingerprint.

Cite this