VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction

  • Xiaoman Wang
  • , Dan Yuan
  • , Xin Liu
  • , Yike Zhao
  • , Xiaoxiao Zhang
  • , Xizhi Chen
  • , Yunshi Lan*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Chinese Grammatical Error Correction (CGEC) plays a significant role in providing automatic feedback to students' writing, especially for Chinese as a Foreign Language Learner (CFL). Particularly, rudimentary CFLs write Chinese characters where phonological and visual confusion is constantly involved. However, existing CGEC studies ignore the multi-modality and potential faked errors (i.e., non-existent characters created due to writing errors), which pushes the techniques far away from real-world scenarios. To address this gap, we develop a dataset, namely VisCGEC, to benchmark the visual Chinese grammatical error correction for Chinese as a Foreign Language Learner (CFL). The dataset contains 2,451 images of handwritten sentences with grammatical errors and corresponding correction texts, which Chinese language experts meticulously annotate. In addition, we propose baseline approaches on VisCGEC and conduct experiments with two CGEC frameworks (i.e., a two-stage pipeline and an end-to-end system), providing a strong baseline for future research. Extensive empirical results and analyses demonstrate that VisCGEC is high-quality but challenging, where the best approach achieves an F0.5 score of only 28.9%. Our dataset and baseline methods are available at https://github.com/xiaoAugenstern/VisCGEC.

Original languageEnglish
Title of host publicationLong Papers
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages5054-5068
Number of pages15
ISBN (Electronic)9798891761896
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 - Hybrid, Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

NameProceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
Volume1

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Country/TerritoryUnited States
CityHybrid, Albuquerque
Period29/04/254/05/25

Fingerprint

Dive into the research topics of 'VisCGEC: Benchmarking the Visual Chinese Grammatical Error Correction'. Together they form a unique fingerprint.

Cite this