TY - GEN
T1 - C2BA
T2 - 2025 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2025
AU - Huang, Weiyi
AU - Xi, Xidong
AU - Wang, Hailing
AU - Cao, Guitao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - In Cross-Modal Domain-Incremental learning, the primary challenge lies in learning from varying data distributions and maintaining its performance on prior domains. However, existing methods often overlook the importance of shared knowledge across domains and the interaction between modalities is still insufficient. To address these issues, we propose Cross-Domain Consistency and Bidirectional Alignment (C2BA), a novel framework that enhances the model's generalization ability and improves the cross-modal integration in VLMs through two key components. We design a Cross-domain Global Consistency Constraint (CGCC) to stabilize domain-invariant representations during incremental training, preventing excessive shifts of shared distributions toward new domains. In addition, we design a Bidirectional Cross-Modal Attention (BCMA) module, which enables effective interaction between visual and textual features through a bidirectional attention mechanism, thereby reducing cross-modal discrepancies. Experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art exemplar-free and even exemplar-based approaches, achieving superior generalization and cross-modal interaction.
AB - In Cross-Modal Domain-Incremental learning, the primary challenge lies in learning from varying data distributions and maintaining its performance on prior domains. However, existing methods often overlook the importance of shared knowledge across domains and the interaction between modalities is still insufficient. To address these issues, we propose Cross-Domain Consistency and Bidirectional Alignment (C2BA), a novel framework that enhances the model's generalization ability and improves the cross-modal integration in VLMs through two key components. We design a Cross-domain Global Consistency Constraint (CGCC) to stabilize domain-invariant representations during incremental training, preventing excessive shifts of shared distributions toward new domains. In addition, we design a Bidirectional Cross-Modal Attention (BCMA) module, which enables effective interaction between visual and textual features through a bidirectional attention mechanism, thereby reducing cross-modal discrepancies. Experiments on three benchmark datasets demonstrate that our method outperforms state-of-the-art exemplar-free and even exemplar-based approaches, achieving superior generalization and cross-modal interaction.
KW - Cross-Modal Attention
KW - Domain-Incremental Learning
KW - Global Knowledge
KW - Vision-Language Model
UR - https://www.scopus.com/pages/publications/105033155452
U2 - 10.1109/SMC58881.2025.11342523
DO - 10.1109/SMC58881.2025.11342523
M3 - 会议稿件
AN - SCOPUS:105033155452
T3 - Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics
SP - 1933
EP - 1939
BT - 2025 IEEE International Conference on Systems, Man, and Cybernetics
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 October 2025 through 8 October 2025
ER -