TY - JOUR
T1 - Dongba Machine Translation with Transfer Learning
T2 - Leveraging Pre-trained Ancient Chinese Models
AU - Ma, Xinchen
AU - Lan, Man
AU - Hu, Wenbo
AU - Lu, Yue
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/4/23
Y1 - 2025/4/23
N2 - The Dongba script, a logographic writing system used by the Naxi people in religious activities, faces challenges in translation due to the advanced age of Dongba script experts and the time-consuming nature of manual deciphering. This study focuses on translating the resource-scarce Dongba script into Modern Chinese using a novel approach based on cross-lingual transfer learning from Ancient Chinese. By examining translation patterns from Ancient Chinese to Modern Chinese, we determine the feasibility of transferring knowledge from Ancient Chinese to Dongba script translation. We propose the Dongba Machine Translation Model (DMTM), a pre-trained, low-resource machine translation model that utilizes the linguistic similarities between Ancient Chinese and Dongba script to improve translation quality. The model undergoes pre-training on a large-scale Ancient Chinese corpus and fine-tuning on a small-scale Dongba script corpus, enabling effective knowledge transfer. To address the scarcity of Dongba script translation resources, we present DongBa Corpus 1.0, a fine-grained parallel dataset of Dongba script and Modern Chinese. Experimental results demonstrate that our proposed DMTM achieves a translation score of 50.01% BLEU on the test set. As no prior methods exist for Dongba script translation, we compared various architectures commonly used in low-resource translation tasks, and DMTM exhibited the best performance with a 5.39% improvement over alternative architectures tested. The implementation codes and dataset for our approach are available at https://github.com/Chloe-mxxxxc/DMTM.
AB - The Dongba script, a logographic writing system used by the Naxi people in religious activities, faces challenges in translation due to the advanced age of Dongba script experts and the time-consuming nature of manual deciphering. This study focuses on translating the resource-scarce Dongba script into Modern Chinese using a novel approach based on cross-lingual transfer learning from Ancient Chinese. By examining translation patterns from Ancient Chinese to Modern Chinese, we determine the feasibility of transferring knowledge from Ancient Chinese to Dongba script translation. We propose the Dongba Machine Translation Model (DMTM), a pre-trained, low-resource machine translation model that utilizes the linguistic similarities between Ancient Chinese and Dongba script to improve translation quality. The model undergoes pre-training on a large-scale Ancient Chinese corpus and fine-tuning on a small-scale Dongba script corpus, enabling effective knowledge transfer. To address the scarcity of Dongba script translation resources, we present DongBa Corpus 1.0, a fine-grained parallel dataset of Dongba script and Modern Chinese. Experimental results demonstrate that our proposed DMTM achieves a translation score of 50.01% BLEU on the test set. As no prior methods exist for Dongba script translation, we compared various architectures commonly used in low-resource translation tasks, and DMTM exhibited the best performance with a 5.39% improvement over alternative architectures tested. The implementation codes and dataset for our approach are available at https://github.com/Chloe-mxxxxc/DMTM.
KW - Dongba script
KW - Neural machine translation
KW - deep learning
KW - pre-trained model
KW - transfer learning
UR - https://www.scopus.com/pages/publications/105005601116
U2 - 10.1145/3721980
DO - 10.1145/3721980
M3 - 文章
AN - SCOPUS:105005601116
SN - 2375-4699
VL - 24
JO - ACM Transactions on Asian and Low-Resource Language Information Processing
JF - ACM Transactions on Asian and Low-Resource Language Information Processing
IS - 5
M1 - 43
ER -