跳到主要导航 跳到搜索 跳到主要内容

LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion

  • Xin Li
  • , Tao Ma
  • , Yuenan Hou
  • , Botian Shi
  • , Yuchen Yang
  • , Youquan Liu
  • , Xingjiao Wu
  • , Qin Chen
  • , Yikang Li*
  • , Yu Qiao
  • , Liang He*
  • *此作品的通讯作者
  • East China Normal University
  • Chinese University of Hong Kong
  • Shanghai AI Laboratory
  • Fudan University
  • Bremerhaven University of Applied Sciences
  • Shanghai Key Laboratory of Multidimensional Information Processing

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camerafusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better crossmodal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally uti-lizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at https://github.com/sankin97/LoGoNet.

源语言英语
主期刊名Proceedings - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
出版商IEEE Computer Society
17524-17534
页数11
ISBN(电子版)9798350301298
DOI
出版状态已出版 - 2023
活动2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 - Vancouver, 加拿大
期限: 18 6月 202322 6月 2023

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2023-June
ISSN(印刷版)1063-6919

会议

会议2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
国家/地区加拿大
Vancouver
时期18/06/2322/06/23

指纹

探究 'LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross- Modal Fusion' 的科研主题。它们共同构成独一无二的指纹。

引用此