TY - GEN
T1 - Dynamic Feature Selection for Structural Image Content Recognition
AU - Fu, Yingnan
AU - Zheng, Shu
AU - Cai, Wenyuan
AU - Gao, Ming
AU - Jin, Cheqing
AU - Zhou, Aoying
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.
AB - Structural image content recognition (SICR) aims to transcribe a two-dimensional structural image (e.g., mathematical expression, chemical formula, or music score) into a token sequence. Existing methods are mainly encoder-decoder based and overlook the importance of feature selection and spatial relation extraction in the feature map. In this paper, we propose DEAL (shorted for Dynamic fEAture seLection) for SICR, which contains a dynamic feature selector and a spatial relation extractor as two cornerstone modules. Specifically, we propose a novel loss function and random exploration strategy to dynamically select useful image cells for target sequence generation. Further, we consider the positional and surrounding information of cells in the feature map to extract spatial relations. We conduct extensive experiments to evaluate the performance of DEAL. Experimental results show that DEAL outperforms other state-of-the-arts significantly.
KW - encoder-decoder network
KW - feature selection
KW - mathematical expression recognition
KW - structural image content recognition
UR - https://www.scopus.com/pages/publications/85152577450
U2 - 10.1007/978-3-031-27818-1_28
DO - 10.1007/978-3-031-27818-1_28
M3 - 会议稿件
AN - SCOPUS:85152577450
SN - 9783031278174
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 337
EP - 349
BT - MultiMedia Modeling - 29th International Conference, MMM 2023, Proceedings
A2 - Dang-Nguyen, Duc-Tien
A2 - Gurrin, Cathal
A2 - Smeaton, Alan F.
A2 - Larson, Martha
A2 - Rudinac, Stevan
A2 - Dao, Minh-Son
A2 - Trattner, Christoph
A2 - Chen, Phoebe
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on MultiMedia Modeling, MMM 2023
Y2 - 9 January 2023 through 12 January 2023
ER -