TY - JOUR
T1 - Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding
AU - Zhan, Hongjian
AU - Lyu, Shujing
AU - Lu, Yue
N1 - Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2022/2
Y1 - 2022/2
N2 - In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character; then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based Encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.
AB - In this paper, we propose the Glyph-Semanteme fusion Embedding (GSE) for Chinese character and apply it to Offline Handwritten Chinese Text Recognition (offline-HCTR). It is well known that the number of Chinese characters is very large and the glyphs of these characters are complex, but few researchers realize that the underlying reason for this phenomenon is that Chinese is a form of ideogram, which indicates that there are correlations between the glyph and semanteme of a character. In order to utilize this feature and create better representations for Chinese characters, firstly, we extract the glyph embedding and semanteme embedding for each Chinese character; then we propose a parameterized gated fusion strategy to automatically calculate the Glyph-Semanteme fusion Embedding for each character by fusing its glyph embedding and semanteme embedding. We apply the proposed GSE to an attention-based Encoder-decoder network for the offline-HCTR task. Furthermore, two kinds of GSE, Character-level GSE (CGSE) and Text-level GSE (TGSE), are applied to the decoder phase to yield the predictions. On the standard benchmark ICDAR-2013 HCTR competition dataset, the proposed method achieves 96.65% character-level recognition accuracy, which demonstrates the effectiveness of the proposed glyph-semanteme fusion embedding.
KW - Embedding fusion
KW - Encoder-decoder
KW - Glyph embedding
KW - Offline Handwritten Chinese text recognition
KW - Semanteme embedding
UR - https://www.scopus.com/pages/publications/85114922518
U2 - 10.1007/s13042-021-01420-7
DO - 10.1007/s13042-021-01420-7
M3 - 文章
AN - SCOPUS:85114922518
SN - 1868-8071
VL - 13
SP - 485
EP - 496
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 2
ER -