TY - GEN
T1 - Towards Robust Chinese Spelling Check Systems
T2 - 12th National CCF Conference on Natural Language Processing and Chinese Computing, NLPCC 2023
AU - Li, Xiang
AU - Du, Hanyue
AU - Zhao, Yike
AU - Lan, Yunshi
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Chinese Spelling Check requires a system to automatically correct spelling errors in a sentence. There are diverse methods proposed to solve this task. A few methods improve the robustness of the model through data augmentation, but they have some weaknesses. Errors inserted randomly might disturb the real distribution of data. Moreover, different models may produce different results when predicting the same error sentence. Based on these intuitions, we develop a multi-round error correction method with ensemble enhancement, which is robust in solving Chinese Spelling Check challenges. Specifically, multi-round error correction follows an iterative correction pipeline, where a single error is corrected at each round, and the subsequent correction is conducted based on the previous results. Furthermore, we proposed two strategies of ensemble enhancement. For each predicted correction, results of multiple models are mutually authenticated by weighted voting and dominate voting. Experiments have proved the effectiveness of our system. It achieves the best performance on NLPCC 2023 CSC shared tasks. More analyses verify that both multi-round error correction and ensemble enhancement contribute to its good results. Our code is publicly available on GitHub.
AB - Chinese Spelling Check requires a system to automatically correct spelling errors in a sentence. There are diverse methods proposed to solve this task. A few methods improve the robustness of the model through data augmentation, but they have some weaknesses. Errors inserted randomly might disturb the real distribution of data. Moreover, different models may produce different results when predicting the same error sentence. Based on these intuitions, we develop a multi-round error correction method with ensemble enhancement, which is robust in solving Chinese Spelling Check challenges. Specifically, multi-round error correction follows an iterative correction pipeline, where a single error is corrected at each round, and the subsequent correction is conducted based on the previous results. Furthermore, we proposed two strategies of ensemble enhancement. For each predicted correction, results of multiple models are mutually authenticated by weighted voting and dominate voting. Experiments have proved the effectiveness of our system. It achieves the best performance on NLPCC 2023 CSC shared tasks. More analyses verify that both multi-round error correction and ensemble enhancement contribute to its good results. Our code is publicly available on GitHub.
KW - Chinese Spelling Check
KW - Ensemble
KW - Multi-round Error Correction
UR - https://www.scopus.com/pages/publications/85174493797
U2 - 10.1007/978-3-031-44699-3_29
DO - 10.1007/978-3-031-44699-3_29
M3 - 会议稿件
AN - SCOPUS:85174493797
SN - 9783031446986
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 325
EP - 336
BT - Natural Language Processing and Chinese Computing - 12th National CCF Conference, NLPCC 2023, Proceedings
A2 - Liu, Fei
A2 - Duan, Nan
A2 - Xu, Qingting
A2 - Hong, Yu
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 12 October 2023 through 15 October 2023
ER -