TY - JOUR
T1 - Learning bi-grained cross-correlation siamese networks for visual tracking
AU - Zhao, Defang
AU - Ma, Chao
AU - Zhu, Dandan
AU - Shuai, Jia
AU - Lu, Jianwei
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2022/9
Y1 - 2022/9
N2 - Siamese network based trackers measure the similarity between a target template and a search region by computing their cross-correlation. Specifically, Siamese trackers regard the target template as a spatial filter to convolve the search region, putting emphasis on the coarse-grained semantic abstraction of the target in the spatial domain. Along with the demonstrated success of Siamese trackers, little attention has been paid to fine-grained spatial details in cross-correlation computation, which is crucial to precise target localization. In this paper, we propose to learn point-wise cross-correlation Siamese networks for visual tracking. By sketching the contour of the target, the proposed point-wise cross-correlation module helps Siamese networks to be aware of the distinctive boundary between the target and background. In conjunction with traditional depth-wise cross-correlation, the proposed Siamese network takes both advantages of coarse-grained semantic abstraction and fine-grained details to precisely locate the target. Extensive experiments demonstrate the effectiveness and efficiency of the proposed tracker, which achieves new state-of-the-art results on five visual tracking benchmarks including VOT2016, VOT2018, VOT2019, OTB100, and LaSOT with the speed of 38 FPS. As an extra benefit, our tracker can output the segmentation mask for the target. We demonstrate the favorable performance of our tracker on the video object segmentation datasets in comparison with the state-of-the-art.
AB - Siamese network based trackers measure the similarity between a target template and a search region by computing their cross-correlation. Specifically, Siamese trackers regard the target template as a spatial filter to convolve the search region, putting emphasis on the coarse-grained semantic abstraction of the target in the spatial domain. Along with the demonstrated success of Siamese trackers, little attention has been paid to fine-grained spatial details in cross-correlation computation, which is crucial to precise target localization. In this paper, we propose to learn point-wise cross-correlation Siamese networks for visual tracking. By sketching the contour of the target, the proposed point-wise cross-correlation module helps Siamese networks to be aware of the distinctive boundary between the target and background. In conjunction with traditional depth-wise cross-correlation, the proposed Siamese network takes both advantages of coarse-grained semantic abstraction and fine-grained details to precisely locate the target. Extensive experiments demonstrate the effectiveness and efficiency of the proposed tracker, which achieves new state-of-the-art results on five visual tracking benchmarks including VOT2016, VOT2018, VOT2019, OTB100, and LaSOT with the speed of 38 FPS. As an extra benefit, our tracker can output the segmentation mask for the target. We demonstrate the favorable performance of our tracker on the video object segmentation datasets in comparison with the state-of-the-art.
KW - Bi-grained
KW - Contour
KW - Depth-wise
KW - Point-wise
KW - Siamese network
UR - https://www.scopus.com/pages/publications/85124085969
U2 - 10.1007/s10489-021-03015-9
DO - 10.1007/s10489-021-03015-9
M3 - 文章
AN - SCOPUS:85124085969
SN - 0924-669X
VL - 52
SP - 12175
EP - 12190
JO - Applied Intelligence
JF - Applied Intelligence
IS - 11
ER -