TY - GEN
T1 - Paper-Level Computerized Adaptive Testing for High-Stakes Examination via Multi-Objective Optimization
AU - Li, Mingjia
AU - Tong, Junkai
AU - Huang, Yiyang
AU - Ding, Yifei
AU - Qian, Hong
AU - Zhou, Aimin
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/8/3
Y1 - 2025/8/3
N2 - Computerized Adaptive Testing (CAT) is a testing technique that accurately infers students' proficiency levels using a relatively small number of questions. Most existing CAT systems operate on a question-level adaptive paradigm, which is suitable for practice scenarios. However, in computerized standardized high-stakes examinations such as the GRE and GMAT, this paradigm faces several challenges: (1) the lack of comparability in exam results(2) high implementation costs due to the reliance on real-time interactions and the financial burden of maintaining CAT testing system, and (3) the difficulty in balancing multiple factors of diagnosis quality, attribute coverage, and question exposure. To address these challenges, we propose a Paper-level Computerized Adaptive Testing (PCAT) and its corresponding evaluation method. PCAT divides an exam into multiple testing stages, where examinees adaptively receive test papers of varying difficulty based on their performance in previous stages. The paper assembly problem in PCAT is solved using a population-based multi-objective optimization (MOO) approach. PCAT offers several advantages: First, the paper-level adaptive mechanism ensures that the questions faced by examinees depend solely on their performance in the earlier stages, maintaining adaptability while enhancing the comparability of results across different examinees. Second, PCAT replaces the selection strategy module in traditional CAT with an assembly module, allowing computationally intensive tasks such as cognitive diagnosis and paper assembly to be completed offline before the exam, eliminating the need for real-time interactions. Additionally, the population-based MOO approach generates a set of high-quality solutions in one run, meeting the demands of frequent administration of standardized high-stakes exams like the GRE and reducing the financial burden of maintaining a large-scale CAT system. Finally, MOO naturally models multiple factors as separate objectives, enabling a balanced consideration of these factors and allowing exam administrators to customize the exam based on specific needs. Extensive experiments on four real-world datasets show that PCAT outperforms state-of-the-art (SOTA) CAT methods in terms of diagnosis quality, attribute coverage, and question exposure, while maintaining the same number of questions answered by examinees. These results highlight PCAT's potential in high-stakes examination settings.
AB - Computerized Adaptive Testing (CAT) is a testing technique that accurately infers students' proficiency levels using a relatively small number of questions. Most existing CAT systems operate on a question-level adaptive paradigm, which is suitable for practice scenarios. However, in computerized standardized high-stakes examinations such as the GRE and GMAT, this paradigm faces several challenges: (1) the lack of comparability in exam results(2) high implementation costs due to the reliance on real-time interactions and the financial burden of maintaining CAT testing system, and (3) the difficulty in balancing multiple factors of diagnosis quality, attribute coverage, and question exposure. To address these challenges, we propose a Paper-level Computerized Adaptive Testing (PCAT) and its corresponding evaluation method. PCAT divides an exam into multiple testing stages, where examinees adaptively receive test papers of varying difficulty based on their performance in previous stages. The paper assembly problem in PCAT is solved using a population-based multi-objective optimization (MOO) approach. PCAT offers several advantages: First, the paper-level adaptive mechanism ensures that the questions faced by examinees depend solely on their performance in the earlier stages, maintaining adaptability while enhancing the comparability of results across different examinees. Second, PCAT replaces the selection strategy module in traditional CAT with an assembly module, allowing computationally intensive tasks such as cognitive diagnosis and paper assembly to be completed offline before the exam, eliminating the need for real-time interactions. Additionally, the population-based MOO approach generates a set of high-quality solutions in one run, meeting the demands of frequent administration of standardized high-stakes exams like the GRE and reducing the financial burden of maintaining a large-scale CAT system. Finally, MOO naturally models multiple factors as separate objectives, enabling a balanced consideration of these factors and allowing exam administrators to customize the exam based on specific needs. Extensive experiments on four real-world datasets show that PCAT outperforms state-of-the-art (SOTA) CAT methods in terms of diagnosis quality, attribute coverage, and question exposure, while maintaining the same number of questions answered by examinees. These results highlight PCAT's potential in high-stakes examination settings.
KW - computerized adaptive testing
KW - high-stakes examination
KW - multi-objective optimization
KW - paper assembly
UR - https://www.scopus.com/pages/publications/105014587628
U2 - 10.1145/3711896.3737073
DO - 10.1145/3711896.3737073
M3 - 会议稿件
AN - SCOPUS:105014587628
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1435
EP - 1446
BT - KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Y2 - 3 August 2025 through 7 August 2025
ER -