TY - GEN
T1 - A Multi-task Automated Assessment System for Essay Scoring
AU - Chen, Shigeng
AU - Lan, Yunshi
AU - Yuan, Zheng
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Most existing automated assessment (AA) systems focus on holistic scoring, falling short in providing learners with comprehensive feedback. In this paper, we propose a Multi-Task Automated Assessment (MTAA) system that can output detailed scores along multiple dimensions of essay quality to provide instructional feedback. This system is built on multi-task learning and incorporates Orthogonality Constraints (OC) to learn distinct information from different tasks. To achieve better training convergence, we develop a training strategy, Dynamic Learning Rate Decay (DLRD), to adapt the learning rates for tasks based on their loss descending rates. The results show that our proposed system achieves state-of-the-art performance on two benchmark datasets: ELLIPSE and ASAP++. Furthermore, we utilize ChatGPT to assess essays in both zero-shot and few-shot contexts using an ELLIPSE subset. The findings suggest that ChatGPT has not yet achieved a level of scoring consistency equivalent to our developed MTAA system and that of human raters.
AB - Most existing automated assessment (AA) systems focus on holistic scoring, falling short in providing learners with comprehensive feedback. In this paper, we propose a Multi-Task Automated Assessment (MTAA) system that can output detailed scores along multiple dimensions of essay quality to provide instructional feedback. This system is built on multi-task learning and incorporates Orthogonality Constraints (OC) to learn distinct information from different tasks. To achieve better training convergence, we develop a training strategy, Dynamic Learning Rate Decay (DLRD), to adapt the learning rates for tasks based on their loss descending rates. The results show that our proposed system achieves state-of-the-art performance on two benchmark datasets: ELLIPSE and ASAP++. Furthermore, we utilize ChatGPT to assess essays in both zero-shot and few-shot contexts using an ELLIPSE subset. The findings suggest that ChatGPT has not yet achieved a level of scoring consistency equivalent to our developed MTAA system and that of human raters.
KW - Automated Essay Scoring
KW - ChatGPT Automated Assessment
KW - Few-Shot Learning
KW - Multi-Task Learning
KW - Zero-Shot Learning
UR - https://www.scopus.com/pages/publications/85200217629
U2 - 10.1007/978-3-031-64299-9_22
DO - 10.1007/978-3-031-64299-9_22
M3 - 会议稿件
AN - SCOPUS:85200217629
SN - 9783031642982
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 276
EP - 283
BT - Artificial Intelligence in Education - 25th International Conference, AIED 2024, Proceedings
A2 - Olney, Andrew M.
A2 - Chounta, Irene-Angelica
A2 - Liu, Zitao
A2 - Santos, Olga C.
A2 - Bittencourt, Ig Ibert
PB - Springer Science and Business Media Deutschland GmbH
T2 - 25th International Conference on Artificial Intelligence in Education, AIED 2024
Y2 - 8 July 2024 through 12 July 2024
ER -