TY - GEN
T1 - ProEqBEV
T2 - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024
AU - Liu, Hongwei
AU - Yang, Jian
AU - Li, Zhengyu
AU - Li, Ke
AU - Zheng, Jianzhang
AU - Wang, Xihao
AU - Tang, Xuan
AU - Chen, Mingsong
AU - You, Xiong
AU - Wei, Xian
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - With the rapid development of autonomous driving systems, 3D object detection based on Bird's Eye View (BEV) in road scenes has witnessed great progress over the past few years. As a road scene exhibits a part-whole hierarchy between the within objects and the scene itself, simple parts (e.g., roads, lane lines, vehicles and pedestrians) can be assembled into progressively more complex shapes to form a BEV representation of the whole road scene. Therefore, a BEV often has multiple levels of freedom on motion, i.e., the rotation and the moving shift of the whole BEV, and the random movements of objects (e.g., pedestrians and vehicles) inside the BEV. However, most of the current single-sensor or multi-sensor fusion-based BEV object detection methods have not yet taken into account capturing such multi-level motion in a BEV. To address this problem, we propose a product group equivariant object detection network framework that is equivariant with respect to multiple levels of symmetry groups based on multi-sensor fusion. The proposed framework extracts local equivariant features of objects in point clouds, while global equivariant features are extracted in both point clouds and images. Furthermore, the network learns diverse rotation-equivariant features and mitigates a significant amount of detection errors caused by rotations of BEV and objects inside a BEV, thereby further enhancing the performance of object detection. The experiment results show that the network architecture significantly improves object detection on mAP and NDS, respectively. In addition, in order to demonstrate the effectiveness of the proposed local-multi-global equivariant components, we conduct sufficient ablation experiments. The results show that the individual components are indispensable for the object detection performance improvement of the overall network architecture.
AB - With the rapid development of autonomous driving systems, 3D object detection based on Bird's Eye View (BEV) in road scenes has witnessed great progress over the past few years. As a road scene exhibits a part-whole hierarchy between the within objects and the scene itself, simple parts (e.g., roads, lane lines, vehicles and pedestrians) can be assembled into progressively more complex shapes to form a BEV representation of the whole road scene. Therefore, a BEV often has multiple levels of freedom on motion, i.e., the rotation and the moving shift of the whole BEV, and the random movements of objects (e.g., pedestrians and vehicles) inside the BEV. However, most of the current single-sensor or multi-sensor fusion-based BEV object detection methods have not yet taken into account capturing such multi-level motion in a BEV. To address this problem, we propose a product group equivariant object detection network framework that is equivariant with respect to multiple levels of symmetry groups based on multi-sensor fusion. The proposed framework extracts local equivariant features of objects in point clouds, while global equivariant features are extracted in both point clouds and images. Furthermore, the network learns diverse rotation-equivariant features and mitigates a significant amount of detection errors caused by rotations of BEV and objects inside a BEV, thereby further enhancing the performance of object detection. The experiment results show that the network architecture significantly improves object detection on mAP and NDS, respectively. In addition, in order to demonstrate the effectiveness of the proposed local-multi-global equivariant components, we conduct sufficient ablation experiments. The results show that the individual components are indispensable for the object detection performance improvement of the overall network architecture.
UR - https://www.scopus.com/pages/publications/85202437426
U2 - 10.1109/ICRA57147.2024.10610492
DO - 10.1109/ICRA57147.2024.10610492
M3 - 会议稿件
AN - SCOPUS:85202437426
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 16178
EP - 16184
BT - 2024 IEEE International Conference on Robotics and Automation, ICRA 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 May 2024 through 17 May 2024
ER -