TY - GEN
T1 - Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
AU - Li, Junsong
AU - Zhou, Jie
AU - Yang, Yutao
AU - Zhan, Bihao
AU - Pan, Qianjun
AU - Ding, Yuyang
AU - Chen, Qin
AU - Bo, Jiang
AU - Lin, Xin
AU - He, Liang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Automatic math correction aims to check students' solutions to mathematical problems via artificial intelligence technologies. Most existing studies focus on judging the final answer at the problem level, while they ignore detailed feedback on each step in a math problem-solving process, which requires abilities of semantic understanding and reasoning. In this paper, we propose a reinforcement learning (RL)-based method to boost large language model (LLM) for step-level automatic math correction, named StepAMC. Particularly, we convert the step-level automatic math correction within the text classification task into an RL problem to enhance the reasoning capabilities of LLMs. Then, we design a space-constrained policy network to improve the stability of RL. Then, we introduce a fine-grained reward network to convert the binary human feedback into a continuous value. We conduct extensive experiments over two benchmark datasets and the results show that our model outperforms the eleven strong baselines.
AB - Automatic math correction aims to check students' solutions to mathematical problems via artificial intelligence technologies. Most existing studies focus on judging the final answer at the problem level, while they ignore detailed feedback on each step in a math problem-solving process, which requires abilities of semantic understanding and reasoning. In this paper, we propose a reinforcement learning (RL)-based method to boost large language model (LLM) for step-level automatic math correction, named StepAMC. Particularly, we convert the step-level automatic math correction within the text classification task into an RL problem to enhance the reasoning capabilities of LLMs. Then, we design a space-constrained policy network to improve the stability of RL. Then, we introduce a fine-grained reward network to convert the binary human feedback into a continuous value. We conduct extensive experiments over two benchmark datasets and the results show that our model outperforms the eleven strong baselines.
KW - Automatic math correction
KW - Large language model
KW - Reinforcement learning
UR - https://www.scopus.com/pages/publications/105022648174
U2 - 10.1109/ICME59968.2025.11209116
DO - 10.1109/ICME59968.2025.11209116
M3 - 会议稿件
AN - SCOPUS:105022648174
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Y2 - 30 June 2025 through 4 July 2025
ER -