TY - GEN
T1 - IoU-Enhanced Attention for End-to-End Task Specific Object Detection
AU - Zhao, Jing
AU - Wu, Shengjian
AU - Sun, Li
AU - Li, Qingli
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Without densely tiled anchor boxes or grid points in the image, sparse R-CNN achieves promising results through a set of object queries and proposal boxes updated in the cascaded training manner. However, due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention, which is usually inaccurate in the early training stage. Moreover, in a scene of dense objects, the object query interacts with many irrelevant ones, reducing its uniqueness and harming the performance. This paper proposes to use IoU between different boxes as a prior for the value routing in self attention. The original attention matrix multiplies the same size matrix computed from the IoU of proposal boxes, and they determine the routing scheme so that the irrelevant features can be suppressed. Furthermore, to accurately extract features for both classification and regression, we add two lightweight projection heads to provide the dynamic channel masks based on object query, and they multiply with the output from dynamic convs, making the results suitable for the two different tasks. We validate the proposed scheme on different datasets, including MS-COCO and CrowdHuman, showing that it significantly improves the performance and increases the model convergence speed. Codes are available at https://github.com/bravezzzzzz/IoU-Enhanced-Attention.
AB - Without densely tiled anchor boxes or grid points in the image, sparse R-CNN achieves promising results through a set of object queries and proposal boxes updated in the cascaded training manner. However, due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention, which is usually inaccurate in the early training stage. Moreover, in a scene of dense objects, the object query interacts with many irrelevant ones, reducing its uniqueness and harming the performance. This paper proposes to use IoU between different boxes as a prior for the value routing in self attention. The original attention matrix multiplies the same size matrix computed from the IoU of proposal boxes, and they determine the routing scheme so that the irrelevant features can be suppressed. Furthermore, to accurately extract features for both classification and regression, we add two lightweight projection heads to provide the dynamic channel masks based on object query, and they multiply with the output from dynamic convs, making the results suitable for the two different tasks. We validate the proposed scheme on different datasets, including MS-COCO and CrowdHuman, showing that it significantly improves the performance and increases the model convergence speed. Codes are available at https://github.com/bravezzzzzz/IoU-Enhanced-Attention.
UR - https://www.scopus.com/pages/publications/85151048376
U2 - 10.1007/978-3-031-26348-4_8
DO - 10.1007/978-3-031-26348-4_8
M3 - 会议稿件
AN - SCOPUS:85151048376
SN - 9783031263477
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 124
EP - 141
BT - Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings
A2 - Wang, Lei
A2 - Gall, Juergen
A2 - Chin, Tat-Jun
A2 - Sato, Imari
A2 - Chellappa, Rama
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th Asian Conference on Computer Vision, ACCV 2022
Y2 - 4 December 2022 through 8 December 2022
ER -