TY - GEN
T1 - VisCGEC
T2 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
AU - Wang, Xiaoman
AU - Yuan, Dan
AU - Liu, Xin
AU - Zhao, Yike
AU - Zhang, Xiaoxiao
AU - Chen, Xizhi
AU - Lan, Yunshi
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Chinese Grammatical Error Correction (CGEC) plays a significant role in providing automatic feedback to students' writing, especially for Chinese as a Foreign Language Learner (CFL). Particularly, rudimentary CFLs write Chinese characters where phonological and visual confusion is constantly involved. However, existing CGEC studies ignore the multi-modality and potential faked errors (i.e., non-existent characters created due to writing errors), which pushes the techniques far away from real-world scenarios. To address this gap, we develop a dataset, namely VisCGEC, to benchmark the visual Chinese grammatical error correction for Chinese as a Foreign Language Learner (CFL). The dataset contains 2,451 images of handwritten sentences with grammatical errors and corresponding correction texts, which Chinese language experts meticulously annotate. In addition, we propose baseline approaches on VisCGEC and conduct experiments with two CGEC frameworks (i.e., a two-stage pipeline and an end-to-end system), providing a strong baseline for future research. Extensive empirical results and analyses demonstrate that VisCGEC is high-quality but challenging, where the best approach achieves an F0.5 score of only 28.9%. Our dataset and baseline methods are available at https://github.com/xiaoAugenstern/VisCGEC.
AB - Chinese Grammatical Error Correction (CGEC) plays a significant role in providing automatic feedback to students' writing, especially for Chinese as a Foreign Language Learner (CFL). Particularly, rudimentary CFLs write Chinese characters where phonological and visual confusion is constantly involved. However, existing CGEC studies ignore the multi-modality and potential faked errors (i.e., non-existent characters created due to writing errors), which pushes the techniques far away from real-world scenarios. To address this gap, we develop a dataset, namely VisCGEC, to benchmark the visual Chinese grammatical error correction for Chinese as a Foreign Language Learner (CFL). The dataset contains 2,451 images of handwritten sentences with grammatical errors and corresponding correction texts, which Chinese language experts meticulously annotate. In addition, we propose baseline approaches on VisCGEC and conduct experiments with two CGEC frameworks (i.e., a two-stage pipeline and an end-to-end system), providing a strong baseline for future research. Extensive empirical results and analyses demonstrate that VisCGEC is high-quality but challenging, where the best approach achieves an F0.5 score of only 28.9%. Our dataset and baseline methods are available at https://github.com/xiaoAugenstern/VisCGEC.
UR - https://www.scopus.com/pages/publications/105027459742
U2 - 10.18653/v1/2025.naacl-long.261
DO - 10.18653/v1/2025.naacl-long.261
M3 - 会议稿件
AN - SCOPUS:105027459742
T3 - Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
SP - 5054
EP - 5068
BT - Long Papers
A2 - Chiruzzo, Luis
A2 - Ritter, Alan
A2 - Wang, Lu
PB - Association for Computational Linguistics (ACL)
Y2 - 29 April 2025 through 4 May 2025
ER -