TY - JOUR
T1 - MPGB
T2 - Learning discriminative embeddings with multi-prototype and gradient balancing strategy for multi-modal 3D open world object detection
AU - Zhang, Haozhe
AU - Ma, Liyan
AU - Li, Zhi
AU - Zeng, Tieyong
N1 - Publisher Copyright:
© 2025
PY - 2025/2/28
Y1 - 2025/2/28
N2 - In recent years, extensive research has been conducted on the closed world 3D object detection. However, the closed-set scenario is not practical for the complex and dynamic real-world environment, especially for autonomous driving systems which require the ability to perceive and respond to various road traffic emergencies. This paper thoroughly investigates multi-modal 3D open world object detection. The primary challenges are unstructured nature (e.g. irregularity and sparsity) and data imbalance. To better capture the intra-class diversity and inter-class difference, we introduce the multi-prototype contrastive learning and a weighted cross-entropy loss. To handle long-tail data distribution problem, we utilize the multi-head structure for region proposal network (RPN) with rate and magnitude gradient balancing strategy. In addition, we incorporate prototypes as feature replay during incremental tasks to alleviate the catastrophic forgetting problem. Extensive experiments on the KITTI and Waymo datasets evidence that the proposed MPGB demonstrates superiority in recognizing both novel and known categories, compared to baselines. The code is available at https://github.com/zhanghaozhe23/MPGB.
AB - In recent years, extensive research has been conducted on the closed world 3D object detection. However, the closed-set scenario is not practical for the complex and dynamic real-world environment, especially for autonomous driving systems which require the ability to perceive and respond to various road traffic emergencies. This paper thoroughly investigates multi-modal 3D open world object detection. The primary challenges are unstructured nature (e.g. irregularity and sparsity) and data imbalance. To better capture the intra-class diversity and inter-class difference, we introduce the multi-prototype contrastive learning and a weighted cross-entropy loss. To handle long-tail data distribution problem, we utilize the multi-head structure for region proposal network (RPN) with rate and magnitude gradient balancing strategy. In addition, we incorporate prototypes as feature replay during incremental tasks to alleviate the catastrophic forgetting problem. Extensive experiments on the KITTI and Waymo datasets evidence that the proposed MPGB demonstrates superiority in recognizing both novel and known categories, compared to baselines. The code is available at https://github.com/zhanghaozhe23/MPGB.
KW - Gradient balancing
KW - Multi-modal 3D object detection
KW - Multi-prototype contrastive learning
KW - Open world problem
UR - https://www.scopus.com/pages/publications/85216920552
U2 - 10.1016/j.knosys.2025.113069
DO - 10.1016/j.knosys.2025.113069
M3 - 文章
AN - SCOPUS:85216920552
SN - 0950-7051
VL - 311
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 113069
ER -