Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding

  • Hongjian Zhan
  • , Shujing Lyu*
  • , Yue Lu
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character; then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based Encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.

Original languageEnglish
Pages (from-to)485-496
Number of pages12
JournalInternational Journal of Machine Learning and Cybernetics
Volume13
Issue number2
DOIs
StatePublished - Feb 2022

Keywords

  • Embedding fusion
  • Encoder-decoder
  • Glyph embedding
  • Offline Handwritten Chinese text recognition
  • Semanteme embedding

Fingerprint

Dive into the research topics of 'Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding'. Together they form a unique fingerprint.

Cite this