UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior

  • Yao Wu
  • , Mingwei Xing
  • , Yachao Zhang*
  • , Xiaotong Luo
  • , Yuan Xie
  • , Yanyun Qu*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization. The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder. In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed UniDSeg. Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs. Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space. Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation. Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5%/54.4% mIoU on “A2D2/sKITTI” for domain adaptive/generalized tasks. Code is available at https://github.com/Barcaaaa/UniDSeg.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

Fingerprint

Dive into the research topics of 'UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior'. Together they form a unique fingerprint.

Cite this