TY - GEN
T1 - Handwritten digit string recognition by combination of residual network and RNN-CTC
AU - Zhan, Hongjian
AU - Wang, Qingqing
AU - Lu, Yue
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we take advantage of the architectures mentioned above to create a new network for handwritten digit string recognition. First we design a residual network to extract features from input images, then we employ a RNN to model the contextual information within feature sequences and predict recognition results. At the top of this network, a standard CTC is applied to calculate the loss and yield the final results. These three parts compose an end-to-end trainable network. The proposed new architecture achieves the highest performances on ORAND-CAR-A and ORAND-CAR-B with recognition rates 89.75% and 91.14%, respectively. In addition, the experiments on a generated captcha dataset which has much longer string length show the potential of the proposed network to handle long strings.
AB - Recurrent neural network (RNN) and connectionist temporal classification (CTC) have showed successes in many sequence labeling tasks with the strong ability of dealing with the problems where the alignment between the inputs and the target labels is unknown. Residual network is a new structure of convolutional neural network and works well in various computer vision tasks. In this paper, we take advantage of the architectures mentioned above to create a new network for handwritten digit string recognition. First we design a residual network to extract features from input images, then we employ a RNN to model the contextual information within feature sequences and predict recognition results. At the top of this network, a standard CTC is applied to calculate the loss and yield the final results. These three parts compose an end-to-end trainable network. The proposed new architecture achieves the highest performances on ORAND-CAR-A and ORAND-CAR-B with recognition rates 89.75% and 91.14%, respectively. In addition, the experiments on a generated captcha dataset which has much longer string length show the potential of the proposed network to handle long strings.
KW - Connectionist temporal classification
KW - Convolutional neural network
KW - Digit string recognition
KW - End to end
KW - Recurrent neural network
UR - https://www.scopus.com/pages/publications/85035147590
U2 - 10.1007/978-3-319-70136-3_62
DO - 10.1007/978-3-319-70136-3_62
M3 - 会议稿件
AN - SCOPUS:85035147590
SN - 9783319701356
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 583
EP - 591
BT - Neural Information Processing - 24th International Conference, ICONIP 2017, Proceedings
A2 - Liu, Derong
A2 - Xie, Shengli
A2 - Zhao, Dongbin
A2 - Li, Yuanqing
A2 - El-Alfy, El-Sayed M.
PB - Springer Verlag
T2 - 24th International Conference on Neural Information Processing, ICONIP 2017
Y2 - 14 November 2017 through 18 November 2017
ER -