MaskViM: Domain Generalized Semantic Segmentation with State Space Models

  • Jiahao Li
  • , Yang Lu
  • , Yuan Xie*
  • , Yanyun Qu*
  • *Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Domain Generalized Semantic Segmentation (DGSS) aims to utilize segmentation model training on known source domains to make predictions on unknown target domains. Currently, there are two kinds of network architectures: one based on Convolutional Neural Networks (CNNs) and the other based on Visual Transformers (ViTs). However, both CNN-based and ViT-based DGSS methods face challenges: the former lacks a global receptive field, while the latter requires more computational demands. Drawing inspiration from State Space Models (SSMs), which not only possess a global receptive field but also maintain linear complexity, we propose SSM-based method for achieving DGSS. In this work, we first elucidate why does mask make sense in SSM-based DGSS and propose our mask learning mechanism. Leveraging this mechanism, we present our Mask Vision Mamba network (MaskViM), a model for SSM-based DGSS, and design our mask loss to optimize MaskViM. Our method achieves superior performance on four diverse DGSS setting, which demonstrates the effectiveness of our method.

Original languageEnglish
Pages (from-to)4752-4760
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number5
DOIs
StatePublished - 11 Apr 2025
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Fingerprint

Dive into the research topics of 'MaskViM: Domain Generalized Semantic Segmentation with State Space Models'. Together they form a unique fingerprint.

Cite this