TY - GEN
T1 - Symbol Location-Aware Network for Improving Handwritten Mathematical Expression Recognition
AU - Fu, Yingnan
AU - Cai, Wenyuan
AU - Gao, Ming
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/6/12
Y1 - 2023/6/12
N2 - Recently most handwritten mathematical expression recognition methods adopt the attention-based encoder-decoder framework, which generates LaTeX sequences from given images. However, the accuracy of the attention mechanism limits the performance of HMER models. Lacking global context information in the decoding process is also a challenge for HMER. Some methods adopt symbol-level counting to localize symbols for improving the model performance, while these methods cannot work well. In this paper, we propose a method named SLAN, shorted for a Symbol Location-Aware Network, to solve the HMER problem. Specifically, we propose an advanced relation-level counting method to detect symbols in the image. We solve the lacking global context problem with a new global context-aware decoder. For improving the accuracy of attention, we design a novel attention alignment loss function by the dynamic programming algorithm, which can learn attention alignment directly without pixel-level labels. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of SLAN and achieved state-of-the-art performance.
AB - Recently most handwritten mathematical expression recognition methods adopt the attention-based encoder-decoder framework, which generates LaTeX sequences from given images. However, the accuracy of the attention mechanism limits the performance of HMER models. Lacking global context information in the decoding process is also a challenge for HMER. Some methods adopt symbol-level counting to localize symbols for improving the model performance, while these methods cannot work well. In this paper, we propose a method named SLAN, shorted for a Symbol Location-Aware Network, to solve the HMER problem. Specifically, we propose an advanced relation-level counting method to detect symbols in the image. We solve the lacking global context problem with a new global context-aware decoder. For improving the accuracy of attention, we design a novel attention alignment loss function by the dynamic programming algorithm, which can learn attention alignment directly without pixel-level labels. We conducted extensive experiments on the CROHME dataset to demonstrate the effectiveness of each part of SLAN and achieved state-of-the-art performance.
KW - dynamic programming
KW - global context
KW - handwritten mathematical expression recognition
KW - symbol counting
UR - https://www.scopus.com/pages/publications/85163689054
U2 - 10.1145/3591106.3592259
DO - 10.1145/3591106.3592259
M3 - 会议稿件
AN - SCOPUS:85163689054
T3 - ICMR 2023 - Proceedings of the 2023 ACM International Conference on Multimedia Retrieval
SP - 516
EP - 524
BT - ICMR 2023 - Proceedings of the 2023 ACM International Conference on Multimedia Retrieval
PB - Association for Computing Machinery, Inc
T2 - 2023 ACM International Conference on Multimedia Retrieval, ICMR 2023
Y2 - 12 June 2023 through 15 June 2023
ER -