TY - JOUR
T1 - Fusion-Mamba for Cross-Modality Object Detection
AU - Dong, Wenhao
AU - Zhu, Haodong
AU - Lin, Shaohui
AU - Luo, Xiaoyan
AU - Shen, Yunhang
AU - Guo, Guodong
AU - Zhang, Baochang
N1 - Publisher Copyright:
© 1999-2012 IEEE.
PY - 2025
Y1 - 2025
N2 - Cross-modality object detection aims to fuse complementary information from different modalities to improve model performance, which achieves a wider range of applications. However, traditional cross-modality fusion methods, based on CNN or Transformer, inadequately address the issue of pseudo-target information, which causes model attention dispersion to degrade object detection performance. In this paper, we investigate a novel cross-modality fusion approach by associating cross-modal features in a hidden state space based on an improved Mamba with a gating attention mechanism. We propose the Fusion-Mamba Block(FMB), designed to map cross-modal features into a hidden state space for interaction, thereby refining the model’s attention on true target areas and enhancing overall performance. The FMB comprises two key modules: State Space Channel Swapping (SSCS) module, which facilitates the fusion of shallow features, and Dual State Space Fusion (DSSF) module, which enables deep fusion and effectively suppresses pseudo-target information within the hidden state space. Our proposed method outperforms state-of-the-art approaches, achieving improvements of 5.9%, 3.5% and 2.1% mAP on M3 FD, DroneVehicle and FLIR-Aligned, respectively. To the best of our knowledge, this work establishes a new baseline for cross-modality object detection, providing a robust foundation for future research in this area.
AB - Cross-modality object detection aims to fuse complementary information from different modalities to improve model performance, which achieves a wider range of applications. However, traditional cross-modality fusion methods, based on CNN or Transformer, inadequately address the issue of pseudo-target information, which causes model attention dispersion to degrade object detection performance. In this paper, we investigate a novel cross-modality fusion approach by associating cross-modal features in a hidden state space based on an improved Mamba with a gating attention mechanism. We propose the Fusion-Mamba Block(FMB), designed to map cross-modal features into a hidden state space for interaction, thereby refining the model’s attention on true target areas and enhancing overall performance. The FMB comprises two key modules: State Space Channel Swapping (SSCS) module, which facilitates the fusion of shallow features, and Dual State Space Fusion (DSSF) module, which enables deep fusion and effectively suppresses pseudo-target information within the hidden state space. Our proposed method outperforms state-of-the-art approaches, achieving improvements of 5.9%, 3.5% and 2.1% mAP on M3 FD, DroneVehicle and FLIR-Aligned, respectively. To the best of our knowledge, this work establishes a new baseline for cross-modality object detection, providing a robust foundation for future research in this area.
KW - Cross-modality
KW - feature fusion
KW - mamba
KW - multi-spectral object detection
UR - https://www.scopus.com/pages/publications/105013481205
U2 - 10.1109/TMM.2025.3599020
DO - 10.1109/TMM.2025.3599020
M3 - 文章
AN - SCOPUS:105013481205
SN - 1520-9210
VL - 27
SP - 7392
EP - 7406
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -