Towards Robust Chinese Spelling Check Systems: Multi-round Error Correction with Ensemble Enhancement

Xiang Li, Hanyue Du, Yike Zhao, Yunshi Lan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Chinese Spelling Check requires a system to automatically correct spelling errors in a sentence. There are diverse methods proposed to solve this task. A few methods improve the robustness of the model through data augmentation, but they have some weaknesses. Errors inserted randomly might disturb the real distribution of data. Moreover, different models may produce different results when predicting the same error sentence. Based on these intuitions, we develop a multi-round error correction method with ensemble enhancement, which is robust in solving Chinese Spelling Check challenges. Specifically, multi-round error correction follows an iterative correction pipeline, where a single error is corrected at each round, and the subsequent correction is conducted based on the previous results. Furthermore, we proposed two strategies of ensemble enhancement. For each predicted correction, results of multiple models are mutually authenticated by weighted voting and dominate voting. Experiments have proved the effectiveness of our system. It achieves the best performance on NLPCC 2023 CSC shared tasks. More analyses verify that both multi-round error correction and ensemble enhancement contribute to its good results. Our code is publicly available on GitHub.

Original languageEnglish
Title of host publicationNatural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings
EditorsFei Liu, Nan Duan, Qingting Xu, Yu Hong
PublisherSpringer Science and Business Media Deutschland GmbH
Pages325-336
Number of pages12
ISBN (Print)9783031446986
DOIs
StatePublished - 2023
Event12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023 - Foshan, China
Duration: 12 Oct 202315 Oct 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14304 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023
Country/TerritoryChina
CityFoshan
Period12/10/2315/10/23

Keywords

  • Chinese Spelling Check
  • Ensemble
  • Multi-round Error Correction

Fingerprint

Dive into the research topics of 'Towards Robust Chinese Spelling Check Systems: Multi-round Error Correction with Ensemble Enhancement'. Together they form a unique fingerprint.

Cite this