Fusion-Then-Distillation: Toward Cross-Modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation

Yao Wu, Mingwei Xing, Yachao Zhang, Yuan Xie, Yanyun Qu

Research output: Contribution to journalArticlepeer-review

Abstract

In cross-modal unsupervised domain adaptation, a model trained on source-domain data (e.g., synthetic) is adapted to target-domain data (e.g., real-world) without access to target annotation. Previous methods seek to mutually mimic cross-modal outputs in each domain, which enforces a class probability distribution that is agreeable in different domains. However, they overlook the complementarity brought by the heterogeneous fusion in cross-modal learning. In light of this, we propose a novel fusion-then-distillation (FtD++) method to explore cross-modal positive distillation of the source and target domains for 3D semantic segmentation. FtD++ realizes distribution consistency between outputs not only for 2D images and 3D point clouds but also for source-domain and augment-domain. Specially, our method contains three key ingredients. First, we present a model-agnostic feature fusion module to generate the cross-modal fusion representation for establishing a latent space. In this space, two modalities are enforced maximum correlation and complementarity. Second, the proposed cross-modal positive distillation preserves the complete information of multi-modal input and combines the semantic content of the source domain with the style of the target domain, thereby achieving domain-modality alignment. Finally, cross-modal debiased pseudo-labeling is devised to model the uncertainty of pseudo-labels via a self-training manner. Extensive experiments report state-of-the-art results on several domain adaptive scenarios under unsupervised and semi-supervised settings. Code is available at https://github.com/Barcaaaa/FtD-PlusPlus

Original languageEnglish
Pages (from-to)9030-9045
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume35
Issue number9
DOIs
StatePublished - 2025

Keywords

  • 3D semantic segmentation
  • cross-modal learning
  • multi-modal fusion representation
  • unsupervised domain adaptation

Fingerprint

Dive into the research topics of 'Fusion-Then-Distillation: Toward Cross-Modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation'. Together they form a unique fingerprint.

Cite this