TY - JOUR
T1 - Length-aware center loss for sequence to sequence Thai scene text recognition
AU - Zhan, Hongjian
AU - Li, Chun
AU - Yin, Bing
AU - Lu, Yue
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/12/9
Y1 - 2025/12/9
N2 - Thai scene text recognition is a challenging task because Thai can be written in both horizontal and vertical directions, allowing characters to be stacked vertically. To address this issue, our previous work combined vertically stacked characters to create new characters. However, this strategy introduced many similar characters. In this paper, we further investigate this problem and propose the Length-aware Center Loss (LC) for Thai scene text recognition. The original center loss was designed for single object recognition tasks. When applied to multi-label tasks like text recognition, center loss is only effective when the lengths of the labels and prediction results are consistent. This can lead to an extreme case where all images receive incorrect predicted text lengths to minimize loss, severely interfering with the recognition process. Therefore, we propose the Length-aware Center Loss for text recognition. We also design the Length Supervision Module (LSM) and the Feature Clustering Module (FCM) to work alongside the LC loss. LSM predicts text length to provide additional supervision signals, while FCM aims to improve recognition performance by minimizing the distance between the features of corresponding class centers. Since there is no publicly available Thai scene text dataset, we have collected a new dataset containing more than 170,000 samples. Extensive experiments conducted on this dataset show that our method achieves superior performance in both string-level and character-level accuracy compared to other methods.
AB - Thai scene text recognition is a challenging task because Thai can be written in both horizontal and vertical directions, allowing characters to be stacked vertically. To address this issue, our previous work combined vertically stacked characters to create new characters. However, this strategy introduced many similar characters. In this paper, we further investigate this problem and propose the Length-aware Center Loss (LC) for Thai scene text recognition. The original center loss was designed for single object recognition tasks. When applied to multi-label tasks like text recognition, center loss is only effective when the lengths of the labels and prediction results are consistent. This can lead to an extreme case where all images receive incorrect predicted text lengths to minimize loss, severely interfering with the recognition process. Therefore, we propose the Length-aware Center Loss for text recognition. We also design the Length Supervision Module (LSM) and the Feature Clustering Module (FCM) to work alongside the LC loss. LSM predicts text length to provide additional supervision signals, while FCM aims to improve recognition performance by minimizing the distance between the features of corresponding class centers. Since there is no publicly available Thai scene text dataset, we have collected a new dataset containing more than 170,000 samples. Extensive experiments conducted on this dataset show that our method achieves superior performance in both string-level and character-level accuracy compared to other methods.
KW - Feature clustering
KW - Length-aware center loss
KW - Sequence to sequence
KW - Thai scene text recognition
UR - https://www.scopus.com/pages/publications/105015600228
U2 - 10.1016/j.engappai.2025.112182
DO - 10.1016/j.engappai.2025.112182
M3 - 文章
AN - SCOPUS:105015600228
SN - 0952-1976
VL - 161
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 112182
ER -