TY - GEN
T1 - 3D Dense Captioning via Prototypical Momentum Distillation
AU - Mi, Jinpeng
AU - Wang, Ying
AU - Jin, Shaofei
AU - Zhang, Shiming
AU - Wei, Xian
AU - Zhang, Jianwei
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - 3D dense captioning aims to describe the crucial regions in 3D visual scenes in the form of natural language. Recent prevailing approaches achieve promising results by leveraging complicated structures incorporated with large-scale models, which necessitate abundant parameters and pose challenges regarding its practical applications. Besides, with limited training data, 3D dense captioners are often susceptible to overfitting, directly degrading caption generation performance. Drawing inspiration from the recent advancements in knowledge distillation, we propose a novel approach termed Prototypical Momentum Distillation (PMD) to prompt the model to generate more detailed captions. PMD incorporates Momentum Distillation (MD) with an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to transfer knowledge by considering the uncertainty of the teacher knowledge. Specifically, we employ the original captioner as the student model and maintain an Exponential Moving Average (EMA) copy of the captioner as the teacher model to impart knowledge as the auxiliary supervision of the student. To abate the misleading caused by uncertain knowledge, we present an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to cluster the distilled knowledge according to its confidence. We then transfer the rearranged knowledge from the teacher to guide the training route of the student. We conduct extensive experiments and ablation studies on two widely used benchmark datasets, ScanRefer and Nr3D. Experimental results demonstrate that PMD outperforms all state-of-the-art approaches on the benchmarks with MLE training, highlighting its effectiveness.
AB - 3D dense captioning aims to describe the crucial regions in 3D visual scenes in the form of natural language. Recent prevailing approaches achieve promising results by leveraging complicated structures incorporated with large-scale models, which necessitate abundant parameters and pose challenges regarding its practical applications. Besides, with limited training data, 3D dense captioners are often susceptible to overfitting, directly degrading caption generation performance. Drawing inspiration from the recent advancements in knowledge distillation, we propose a novel approach termed Prototypical Momentum Distillation (PMD) to prompt the model to generate more detailed captions. PMD incorporates Momentum Distillation (MD) with an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to transfer knowledge by considering the uncertainty of the teacher knowledge. Specifically, we employ the original captioner as the student model and maintain an Exponential Moving Average (EMA) copy of the captioner as the teacher model to impart knowledge as the auxiliary supervision of the student. To abate the misleading caused by uncertain knowledge, we present an Uncertainty-aware Prototype-anchored Clustering (UPC) strategy to cluster the distilled knowledge according to its confidence. We then transfer the rearranged knowledge from the teacher to guide the training route of the student. We conduct extensive experiments and ablation studies on two widely used benchmark datasets, ScanRefer and Nr3D. Experimental results demonstrate that PMD outperforms all state-of-the-art approaches on the benchmarks with MLE training, highlighting its effectiveness.
UR - https://www.scopus.com/pages/publications/105016662298
U2 - 10.1109/ICRA55743.2025.11128104
DO - 10.1109/ICRA55743.2025.11128104
M3 - 会议稿件
AN - SCOPUS:105016662298
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 1444
EP - 1450
BT - 2025 IEEE International Conference on Robotics and Automation, ICRA 2025
A2 - Ott, Christian
A2 - Admoni, Henny
A2 - Behnke, Sven
A2 - Bogdan, Stjepan
A2 - Bolopion, Aude
A2 - Choi, Youngjin
A2 - Ficuciello, Fanny
A2 - Gans, Nicholas
A2 - Gosselin, Clement
A2 - Harada, Kensuke
A2 - Kayacan, Erdal
A2 - Kim, H. Jin
A2 - Leutenegger, Stefan
A2 - Liu, Zhe
A2 - Maiolino, Perla
A2 - Marques, Lino
A2 - Matsubara, Takamitsu
A2 - Mavromatti, Anastasia
A2 - Minor, Mark
A2 - O'Kane, Jason
A2 - Park, Hae Won
A2 - Park, Hae-Won
A2 - Rekleitis, Ioannis
A2 - Renda, Federico
A2 - Ricci, Elisa
A2 - Riek, Laurel D.
A2 - Sabattini, Lorenzo
A2 - Shen, Shaojie
A2 - Sun, Yu
A2 - Wieber, Pierre-Brice
A2 - Yamane, Katsu
A2 - Yu, Jingjin
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 IEEE International Conference on Robotics and Automation, ICRA 2025
Y2 - 19 May 2025 through 23 May 2025
ER -