TY - JOUR
T1 - UniDSeg
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
AU - Wu, Yao
AU - Xing, Mingwei
AU - Zhang, Yachao
AU - Luo, Xiaotong
AU - Xie, Yuan
AU - Qu, Yanyun
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - 3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization. The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder. In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed UniDSeg. Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs. Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space. Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation. Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5%/54.4% mIoU on “A2D2/sKITTI” for domain adaptive/generalized tasks. Code is available at https://github.com/Barcaaaa/UniDSeg.
AB - 3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization. The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder. In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed UniDSeg. Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs. Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space. Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation. Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57.5%/54.4% mIoU on “A2D2/sKITTI” for domain adaptive/generalized tasks. Code is available at https://github.com/Barcaaaa/UniDSeg.
UR - https://www.scopus.com/pages/publications/105000469600
M3 - 会议文章
AN - SCOPUS:105000469600
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 9 December 2024 through 15 December 2024
ER -