Dongba Machine Translation with Transfer Learning: Leveraging Pre-trained Ancient Chinese Models

  • Xinchen Ma
  • , Man Lan
  • , Wenbo Hu
  • , Yue Lu*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The Dongba script, a logographic writing system used by the Naxi people in religious activities, faces challenges in translation due to the advanced age of Dongba script experts and the time-consuming nature of manual deciphering. This study focuses on translating the resource-scarce Dongba script into Modern Chinese using a novel approach based on cross-lingual transfer learning from Ancient Chinese. By examining translation patterns from Ancient Chinese to Modern Chinese, we determine the feasibility of transferring knowledge from Ancient Chinese to Dongba script translation. We propose the Dongba Machine Translation Model (DMTM), a pre-trained, low-resource machine translation model that utilizes the linguistic similarities between Ancient Chinese and Dongba script to improve translation quality. The model undergoes pre-training on a large-scale Ancient Chinese corpus and fine-tuning on a small-scale Dongba script corpus, enabling effective knowledge transfer. To address the scarcity of Dongba script translation resources, we present DongBa Corpus 1.0, a fine-grained parallel dataset of Dongba script and Modern Chinese. Experimental results demonstrate that our proposed DMTM achieves a translation score of 50.01% BLEU on the test set. As no prior methods exist for Dongba script translation, we compared various architectures commonly used in low-resource translation tasks, and DMTM exhibited the best performance with a 5.39% improvement over alternative architectures tested. The implementation codes and dataset for our approach are available at https://github.com/Chloe-mxxxxc/DMTM.

Original languageEnglish
Article number43
JournalACM Transactions on Asian and Low-Resource Language Information Processing
Volume24
Issue number5
DOIs
StatePublished - 23 Apr 2025

Keywords

  • Dongba script
  • Neural machine translation
  • deep learning
  • pre-trained model
  • transfer learning

Fingerprint

Dive into the research topics of 'Dongba Machine Translation with Transfer Learning: Leveraging Pre-trained Ancient Chinese Models'. Together they form a unique fingerprint.

Cite this