跳到主要导航 跳到搜索 跳到主要内容

Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception

  • East China Normal University
  • Xiamen University
  • Tencent

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

An effective pre-training framework with universal 3D representations is extremely desired in perceiving large- scale dynamic scenes. However, establishing such an ideal framework that is both task-generic and label-efficient poses a challenge in unifying the representation of the same primitive across diverse scenes. The current contrastive 3D pre-training methods typically follow a frame-level consistency, which focuses on the 2D-3D relationships in each detached image. Such inconsiderate consistency greatly hampers the promising path of reaching an universal pre-training framework: (1) The cross-scene semantic self-conflict, i.e., the intense collision between primitive segments of the same semantics from different scenes; (2) Lacking a globally unified bond that pushes the cross-scene semantic consistency into 3D representation learning. To address above challenges, we propose a CSC framework that puts a scene-level semantic consistency in the heart, bridging the connection of the similar semantic segments across various scenes. To achieve this goal, we combine the coherent semantic cues provided by the vision foundation model and the knowledge-rich cross-scene prototypes derived from the complementary multi-modality information. These allow us to train a universal 3D pre-training model that facilitates various downstream tasks with less fine-tuning efforts. Empirically, we achieve consistent improvements over SOTA pre-training approaches in semantic segmentation (+1.4% mIoU), object detection (+ 1.0% mAP), and panoptic segmentation (+3.0% PQ) using their task-specific 3D network on nuScenes. Code is released at https://github.com/chenhaomingbob/CSC, hoping to inspire future research.

源语言英语
主期刊名Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
出版商IEEE Computer Society
19925-19935
页数11
ISBN(电子版)9798350353006
DOI
出版状态已出版 - 2024
活动2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, 美国
期限: 16 6月 202422 6月 2024

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN(印刷版)1063-6919

会议

会议2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
国家/地区美国
Seattle
时期16/06/2422/06/24

指纹

探究 'Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception' 的科研主题。它们共同构成独一无二的指纹。

引用此