TY - JOUR
T1 - BEVSOC
T2 - Self-Supervised Contrastive Learning for Calibration-Free BEV 3-D Object Detection
AU - Chen, Yongqing
AU - Li, Nanyu
AU - Zhu, Dandan
AU - Zhou, Charles C.
AU - Hu, Zhuhua
AU - Bai, Yong
AU - Yan, Jun
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2024/6/15
Y1 - 2024/6/15
N2 - 3-D object detection based on multiview cameras and bird's-eye view (BEV) representation is a key task for autonomous driving, as it enables the perception systems to understand the surrounding scenes. However, most existing BEV representation methods rely on the projection matrix of camera intrinsic and extrinsic parameters, which requires a complex and time-consuming calibration process that may introduce errors and degrade the detection performance. Moreover, the calibration results may vary due to environmental changes and affect the stability of the detection system. To address this problem, we propose a calibration-free 3-D object detection method that leverages a group-equivariant convolutional network to extract features from multiview images and a projection network module to learn the implicit 3D-to-2D projection relationship for obtaining BEV representation. Furthermore, we employ contrastive learning (CL) to pretrain the projection network module without using manually annotated data. By exploiting the multiview camera data through CL, our proposed method eliminates the need for tedious calibration, avoids calibration errors, and reduces the dependence on a large amount of annotated data for calibration-free 3-D object detection. We evaluate our method on the nuScenes data set and demonstrate its competitive performance. Our method improves the stability and reliability of 3-D object detection in long-term autonomous driving.
AB - 3-D object detection based on multiview cameras and bird's-eye view (BEV) representation is a key task for autonomous driving, as it enables the perception systems to understand the surrounding scenes. However, most existing BEV representation methods rely on the projection matrix of camera intrinsic and extrinsic parameters, which requires a complex and time-consuming calibration process that may introduce errors and degrade the detection performance. Moreover, the calibration results may vary due to environmental changes and affect the stability of the detection system. To address this problem, we propose a calibration-free 3-D object detection method that leverages a group-equivariant convolutional network to extract features from multiview images and a projection network module to learn the implicit 3D-to-2D projection relationship for obtaining BEV representation. Furthermore, we employ contrastive learning (CL) to pretrain the projection network module without using manually annotated data. By exploiting the multiview camera data through CL, our proposed method eliminates the need for tedious calibration, avoids calibration errors, and reduces the dependence on a large amount of annotated data for calibration-free 3-D object detection. We evaluate our method on the nuScenes data set and demonstrate its competitive performance. Our method improves the stability and reliability of 3-D object detection in long-term autonomous driving.
KW - 3-D object detection
KW - calibration free
KW - contrastive learning (CL)
KW - group equivariant convolution
KW - self-supervised
UR - https://www.scopus.com/pages/publications/85188471878
U2 - 10.1109/JIOT.2024.3379471
DO - 10.1109/JIOT.2024.3379471
M3 - 文章
AN - SCOPUS:85188471878
SN - 2327-4662
VL - 11
SP - 22167
EP - 22182
JO - IEEE Internet of Things Journal
JF - IEEE Internet of Things Journal
IS - 12
ER -