Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction

Xinyuan Li, Yunshi Lan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) have achieved outstanding performance across various NLP tasks. Grammatical Error Correction (GEC) is a task aiming at automatically correcting grammatical errors in text, but it encounters a severe shortage of annotated data. Researchers have tried to make full use of the generalization capabilities of LLMs and prompt them to correct erroneous sentences, which however results in unexpected over-correction issues. In this paper, we rethink the role of LLMs in GEC tasks and propose a method, namely TypeDA, considering LLMs as the annotators for type-aware data augmentation in GEC tasks. Different from the existing data augmentation methods, our method prevents in-distribution corruption and is able to generate sentences with multi-granularity error types. Our experiments verify that our method can generally improve the GEC performance of different backbone models with only a small amount of augmented data. Further analyses verify the high consistency and diversity of the pseudo data generated via our method. Our code can be accessed via the provided URL.

Original languageEnglish
Title of host publicationMain Conference
EditorsOwen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
PublisherAssociation for Computational Linguistics (ACL)
Pages199-213
Number of pages15
ISBN (Electronic)9798891761964
StatePublished - 2025
Event31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: 19 Jan 202524 Jan 2025

Publication series

NameProceedings - International Conference on Computational Linguistics, COLING
ISSN (Print)2951-2093

Conference

Conference31st International Conference on Computational Linguistics, COLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period19/01/2524/01/25

Fingerprint

Dive into the research topics of 'Large Language Models are Good Annotators for Type-aware Data Augmentation in Grammatical Error Correction'. Together they form a unique fingerprint.

Cite this