Abstract
Despite Transformers have achieved significant success in low-level vision tasks, they are constrained by computing self-attention with a quadratic complexity and limited-size windows. This limitation results in a lack of global receptive field across the entire image. Recently, State Space Models (SSMs) have gained widespread attention due to their global receptive field and linear complexity with respect to input length. However, integrating SSMs into low-level vision tasks presents two major challenges: 1) Relationship degradation of long-range tokens with a long-range forgetting problem by encoding pixel-by-pixel high-resolution images. 2) Significant redundancy in the existing multi-direction scanning strategy. To this end, we propose Hi-Mamba for image super-resolution (SR) to address these challenges, which unfolds the image with only a single scan. Specifically, the Global Hierarchical Mamba Block (GHMB) enables token interactions across the entire image, providing a global receptive field while leveraging a multi-scale structure to facilitate long-range dependency learning. Additionally, the Direction Alternation Module (DAM) adjusts the scanning patterns of GHMB across different layers to enhance spatial relationship modeling. Extensive experiments demonstrate that our Hi-Mamba achieves 0.2–0.27dB PSNR gains on the Urban100 dataset across different scaling factors compared to the state-of-the-art MambaIRv2 for SR. Moreover, our lightweight Hi-Mamba also outperforms lightweight SRFormer by 0.39dB PSNR for ×2 SR.
| Original language | English |
|---|---|
| Pages (from-to) | 8461-8473 |
| Number of pages | 13 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 34 |
| DOIs | |
| State | Published - 2025 |
Keywords
- hierarchical Mamba
- Image super-resolution
- scanning strategy
- state space model