TY - JOUR
T1 - TextRSR
T2 - Enhanced Arbitrary-Shaped Scene Text Representation Via Robust Subspace Recovery
AU - Shao, Zhiwen
AU - Jiang, Shengtian
AU - Zhu, Hancheng
AU - Shi, Xuehuai
AU - Li, Canlin
AU - Ma, Lizhuang
AU - Yeung, Dit Yan
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2026
Y1 - 2026
N2 - In recent years, scene text detection research has increasingly focused on arbitrary-shaped texts, where text representation is a fundamental problem. However, most existing methods still struggle to separate adjacent or overlapping texts due to ambiguous spatial positions of points or segmentation masks. Besides, the time efficiency of the entire pipeline is often neglected, resulting in sub-optimal inference speed. To tackle these problems, we first propose a novel text representation method based on robust subspace recovery, which robustly represents complex text shapes by combining orthogonal basis vectors learned from labeled text contours. These basis vectors capture basis contour patterns with distinct information, enabling clearer boundaries even in densely populated text scenarios. Moreover, we propose a dynamic sparse assignment scheme for positive samples that adaptively adjusts their weights during training, which not only accelerates inference speed by eliminating redundant predictions but also enhances feature learning by providing sufficient supervision signals. Building on these innovations, we present TextRSR, an accurate and efficient scene text detection network. Extensive experiments on challenging benchmarks demonstrate the superior accuracy and efficiency of TextRSR compared to state-of-the-art methods. Particularly, TextRSR achieves an F-measure of 88.5% at 37.8 frames per second (FPS) for CTW1500 dataset and an F-measure of 89.1% at 23.1 FPS for Total-Text dataset.
AB - In recent years, scene text detection research has increasingly focused on arbitrary-shaped texts, where text representation is a fundamental problem. However, most existing methods still struggle to separate adjacent or overlapping texts due to ambiguous spatial positions of points or segmentation masks. Besides, the time efficiency of the entire pipeline is often neglected, resulting in sub-optimal inference speed. To tackle these problems, we first propose a novel text representation method based on robust subspace recovery, which robustly represents complex text shapes by combining orthogonal basis vectors learned from labeled text contours. These basis vectors capture basis contour patterns with distinct information, enabling clearer boundaries even in densely populated text scenarios. Moreover, we propose a dynamic sparse assignment scheme for positive samples that adaptively adjusts their weights during training, which not only accelerates inference speed by eliminating redundant predictions but also enhances feature learning by providing sufficient supervision signals. Building on these innovations, we present TextRSR, an accurate and efficient scene text detection network. Extensive experiments on challenging benchmarks demonstrate the superior accuracy and efficiency of TextRSR compared to state-of-the-art methods. Particularly, TextRSR achieves an F-measure of 88.5% at 37.8 frames per second (FPS) for CTW1500 dataset and an F-measure of 89.1% at 23.1 FPS for Total-Text dataset.
KW - arbitrary-shaped text representation
KW - dynamic sparse assignment
KW - robust subspace recovery
KW - Scene text detection
UR - https://www.scopus.com/pages/publications/105027330019
U2 - 10.1109/TMM.2026.3651034
DO - 10.1109/TMM.2026.3651034
M3 - 文章
AN - SCOPUS:105027330019
SN - 1520-9210
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -