TY - GEN
T1 - A New Fourier-Attention Guided Approach for Domain-Agnostic Text Localization
AU - Halder, Arnab
AU - Palaiahnakote, Shivakumara
AU - Pal, Umapada
AU - Blumenstein, Michael
AU - Lu, Yue
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Text detection in images of adverse situations like underwater images and open day and night environments, where one can expect the effect of shaky and non-shaky cameras, is challenging. This work aims to develop a new model that can cope with the challenges of different domains, namely, underwater images, shaky and non-shaky images, and normal scene images for text detection. The approach leverages the Fourier attention and kernels to enhance feature extraction, focusing on high-frequency components associated with text edges. These features are fed to dual-stream corner detection by employing vertical and horizontal pooling for robust text detection. Additionally, we introduce a cross-star deformable convolution layer, guided by Fourier-derived information, which dynamically adapts its receptive field to achieve precise bounding box localization. Bounding box predictions are iteratively refined using heatmaps and offset adjustments. Overall, by integrating frequency-domain analysis with spatially adaptive convolutional operations, our method excels across diverse text detection scenarios without requiring domain-specific adaptations. The performance of the proposed method is demonstrated by testing on three different datasets: underwater, shaky and non-shaky images, and normal natural scene images. The results show that the proposed method achieves state-of-the-art performance compared to the existing methods.
AB - Text detection in images of adverse situations like underwater images and open day and night environments, where one can expect the effect of shaky and non-shaky cameras, is challenging. This work aims to develop a new model that can cope with the challenges of different domains, namely, underwater images, shaky and non-shaky images, and normal scene images for text detection. The approach leverages the Fourier attention and kernels to enhance feature extraction, focusing on high-frequency components associated with text edges. These features are fed to dual-stream corner detection by employing vertical and horizontal pooling for robust text detection. Additionally, we introduce a cross-star deformable convolution layer, guided by Fourier-derived information, which dynamically adapts its receptive field to achieve precise bounding box localization. Bounding box predictions are iteratively refined using heatmaps and offset adjustments. Overall, by integrating frequency-domain analysis with spatially adaptive convolutional operations, our method excels across diverse text detection scenarios without requiring domain-specific adaptations. The performance of the proposed method is demonstrated by testing on three different datasets: underwater, shaky and non-shaky images, and normal natural scene images. The results show that the proposed method achieves state-of-the-art performance compared to the existing methods.
KW - Fourier Attention
KW - Shaky-Non-Shaky Text
KW - Text Detection
KW - Underwater Text
UR - https://www.scopus.com/pages/publications/105017376504
U2 - 10.1007/978-3-032-04624-6_11
DO - 10.1007/978-3-032-04624-6_11
M3 - 会议稿件
AN - SCOPUS:105017376504
SN - 9783032046239
T3 - Lecture Notes in Computer Science
SP - 180
EP - 199
BT - Document Analysis and Recognition – ICDAR 2025 - 19th International Conference, Proceedings
A2 - Yin, Xu-Cheng
A2 - Karatzas, Dimosthenis
A2 - Lopresti, Daniel
PB - Springer Science and Business Media Deutschland GmbH
T2 - 19th International Conference on Document Analysis and Recognition, ICDAR 2025
Y2 - 16 September 2025 through 21 September 2025
ER -