TY - JOUR
T1 - SANet-SI
T2 - A new Self-Attention-Network for Script Identification in scene images
AU - Li, Xiaomeng
AU - Zhan, Hongjian
AU - Shivakumara, Palaiahnakote
AU - Pal, Umapada
AU - Lu, Yue
N1 - Publisher Copyright:
© 2023
PY - 2023/7
Y1 - 2023/7
N2 - Developing an automatic method for identifying scripts in natural scene text images is of great importance for improving performance of multilingual OCR. This paper presents a new Self-Attention Network (SANet-SI) for script identification in natural scene text images. The rationale behind proposing SANet-SI is that each script exhibits its own pattern because of different characteristics of scripts. To extract such observations, we explore self-attention-based CNN with a multi-scale feature extraction approach. The proposed multi-scale feature extraction involves local, global features extraction and fusion of both the features. Furthermore, to extract dominant features from the pool of features that contribute more for script identification, we explore Style-based Recalibration Module (SRM) in a new way. In addition, to improve the performance of the identification and reduce the model size, the proposed model uses the Global Average Pooling (GAP) layer, instead of Fully Connected(FC) layers in this work. The proposed model is evaluated on standard datasets, namely, RRC-MLT2017, SIW-13, and CVSI2015 to show effectiveness over state-of-the-art methods in terms of confusion matrix and classification rate. In addition, we also conducted experiments for Cross Dataset Validation to show that the proposed model is independent of the number of scripts and different datasets.
AB - Developing an automatic method for identifying scripts in natural scene text images is of great importance for improving performance of multilingual OCR. This paper presents a new Self-Attention Network (SANet-SI) for script identification in natural scene text images. The rationale behind proposing SANet-SI is that each script exhibits its own pattern because of different characteristics of scripts. To extract such observations, we explore self-attention-based CNN with a multi-scale feature extraction approach. The proposed multi-scale feature extraction involves local, global features extraction and fusion of both the features. Furthermore, to extract dominant features from the pool of features that contribute more for script identification, we explore Style-based Recalibration Module (SRM) in a new way. In addition, to improve the performance of the identification and reduce the model size, the proposed model uses the Global Average Pooling (GAP) layer, instead of Fully Connected(FC) layers in this work. The proposed model is evaluated on standard datasets, namely, RRC-MLT2017, SIW-13, and CVSI2015 to show effectiveness over state-of-the-art methods in terms of confusion matrix and classification rate. In addition, we also conducted experiments for Cross Dataset Validation to show that the proposed model is independent of the number of scripts and different datasets.
KW - Feature fusion
KW - Language identification
KW - Script identification
UR - https://www.scopus.com/pages/publications/85159388069
U2 - 10.1016/j.patrec.2023.04.015
DO - 10.1016/j.patrec.2023.04.015
M3 - 文章
AN - SCOPUS:85159388069
SN - 0167-8655
VL - 171
SP - 45
EP - 52
JO - Pattern Recognition Letters
JF - Pattern Recognition Letters
ER -