TY - JOUR
T1 - Knowledge Transfer Across Modalities for Weakly Supervised Point Cloud Semantic Segmentation
AU - Wang, Zihan
AU - Shen, Yunhang
AU - Li, Mengtian
AU - Li, Ke
AU - Sun, Xing
AU - Lin, Shaohui
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2025 Institute of Electrical and Electronics Engineers Inc.. All rights reserved.
PY - 2025
Y1 - 2025
N2 - Current weakly supervised point cloud semantic segmentation struggles with insufficient utilization of limited annotations in unimodal representation learning due to the sparse and textureless nature of point clouds. In this work, we leverage cross-modality information by transferring knowledge from image and text sources to the point cloud network. The intuition is that images contribute rich texture, color, and discriminative information, complementing point clouds to boost semantic segmentation performance. To reduce extensive computational resources for cross-modality fusion, we introduce the Multi-Scale Deformable Knowledge Transfer, an innovative training scheme that optimizes and extends the one-to-one mapping to flexible one-to-many relations between multi-modal data. Furthermore, we employ pre-trained image-text models to generate pseudo labels for point clouds and construct positive and negative samples for semantic contrastive regularization, facilitating the full exploitation of unlabeled data. The experimental results evaluated on SemanticKITTI and nuScenes demonstrate substantial improvements, achieving an average gain of 3.8% over the previous weakly supervised methods, and comparable performances to fully supervised approaches.
AB - Current weakly supervised point cloud semantic segmentation struggles with insufficient utilization of limited annotations in unimodal representation learning due to the sparse and textureless nature of point clouds. In this work, we leverage cross-modality information by transferring knowledge from image and text sources to the point cloud network. The intuition is that images contribute rich texture, color, and discriminative information, complementing point clouds to boost semantic segmentation performance. To reduce extensive computational resources for cross-modality fusion, we introduce the Multi-Scale Deformable Knowledge Transfer, an innovative training scheme that optimizes and extends the one-to-one mapping to flexible one-to-many relations between multi-modal data. Furthermore, we employ pre-trained image-text models to generate pseudo labels for point clouds and construct positive and negative samples for semantic contrastive regularization, facilitating the full exploitation of unlabeled data. The experimental results evaluated on SemanticKITTI and nuScenes demonstrate substantial improvements, achieving an average gain of 3.8% over the previous weakly supervised methods, and comparable performances to fully supervised approaches.
KW - Knowledge Transfer
KW - Multi-Modal
KW - Semantic Segmentation
KW - Weakly Supervised
UR - https://www.scopus.com/pages/publications/105009793807
U2 - 10.1109/ICASSP49660.2025.10890346
DO - 10.1109/ICASSP49660.2025.10890346
M3 - 会议文章
AN - SCOPUS:105009793807
SN - 0736-7791
JO - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
JF - Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
T2 - 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Y2 - 6 April 2025 through 11 April 2025
ER -