跳到主要导航 跳到搜索 跳到主要内容

FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation

  • Hanyue Du
  • , Yike Zhao
  • , Qingyuan Tian
  • , Jiani Wang
  • , Lei Wang
  • , Yunshi Lan*
  • , Xuesong Lu
  • *此作品的通讯作者
  • East China Normal University
  • Singapore Management University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce FlaCGEC, which is a new CGEC dataset featured with fine-grained linguistic annotation. Specifically, we collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually, which results in 10k sentences with 78 instantiated grammar points and 3 types of edits. We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors. In addition, we also treat FlaCGEC as a diagnostic dataset for testing generalization skills and conduct a thorough evaluation of existing CGEC models.

源语言英语
主期刊名CIKM 2023 - Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
出版商Association for Computing Machinery
5321-5325
页数5
ISBN(电子版)9798400701245
DOI
出版状态已出版 - 21 10月 2023
活动32nd ACM International Conference on Information and Knowledge Management, CIKM 2023 - Birmingham, 英国
期限: 21 10月 202325 10月 2023

出版系列

姓名International Conference on Information and Knowledge Management, Proceedings

会议

会议32nd ACM International Conference on Information and Knowledge Management, CIKM 2023
国家/地区英国
Birmingham
时期21/10/2325/10/23

指纹

探究 'FlaCGEC: A Chinese Grammatical Error Correction Dataset with Fine-grained Linguistic Annotation' 的科研主题。它们共同构成独一无二的指纹。

引用此