Abstract
Recently, Transformer-based methods have achieved impressive performance in many computer vision tasks (e.g., image super-resolution (SR)) due to the advantages of long-range modeling. However, the computational cost requirement renders these methods unsuitable on resource-constrain devices, especially for image SR tasks involving high-resolution images. In this paper, we propose a concise and effective Gated Convolutional Attention Unit (GCAU) that uses cheap convolutional operations. Specifically, GCAU consists of Convolutional Transposed Attention (CTA) and Locally-enhanced Gating (LeG) in parallel. The former allows for efficient modeling of the global relational interactions by calculating cross-covariance across channels dimension, while the latter controls the information flow from the former directing the network to focus on more refined image attributes. Without bells and whistles, we present a simple SR Transformer GCAT by cascading the GCAUs. Extensive experimental results demonstrate that our GCAT achieves state-of-the-art performance among the existing efficient SR methods with significantly less complexity. Especially, GCAT is on average 5× faster than SwinIR-light with comparable performance.
| Original language | English |
|---|---|
| Journal | Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing |
| DOIs | |
| State | Published - 2025 |
| Event | 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India Duration: 6 Apr 2025 → 11 Apr 2025 |
Keywords
- Efficient Network
- Gated Linear Unit
- Image Super-resolution
- Transformer