TY - GEN
T1 - A handwritten chinese text recognizer applying multi-level multimodal fusion network
AU - Xiu, Yuhuan
AU - Wang, Qingqing
AU - Zhan, Hongjian
AU - Lan, Man
AU - Lu, Yue
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
AB - Handwritten Chinese text recognition (HCTR) has received extensive attention from the community of pattern recognition in the past decades. Most existing deep learning methods consist of two stages, i.e., training a text recognition network on the base of visual information, followed by incorporating language constrains with various language models. Therefore, the inherent linguistic semantic information is often neglected when designing the recognition network. To tackle this problem, in this work, we propose a novel multi-level multimodal fusion network and properly embed it into an attention-based LSTM so that both the visual information and the linguistic semantic information can be fully leveraged when predicting sequential outputs from the feature vectors. Experimental results on the ICDAR-2013 competition dataset demonstrate a comparable result with the state-of-the-art approaches.
KW - Attention based LSTM
KW - Handwritten Chinese text recognition
KW - Language model
KW - Linguistic semantic information
KW - Multi-level multimodal fusion
UR - https://www.scopus.com/pages/publications/85079862375
U2 - 10.1109/ICDAR.2019.00235
DO - 10.1109/ICDAR.2019.00235
M3 - 会议稿件
AN - SCOPUS:85079862375
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 1464
EP - 1469
BT - Proceedings - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
PB - IEEE Computer Society
T2 - 15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019
Y2 - 20 September 2019 through 25 September 2019
ER -