TY - JOUR
T1 - LIPT
T2 - Latency-Aware Image Processing Transformer
AU - Qiao, Junbo
AU - Li, Wei
AU - Xie, Haizhen
AU - Chen, Hanting
AU - Hu, Jie
AU - Lin, Shaohui
AU - Han, Jungong
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Transformer is leading a trend in the field of image processing. While existing lightweight image processing transformers have achieved notable success, they primarily focus on reducing FLOPs (floating-point operations) or the number of parameters, rather than on practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup. Specifically, we propose a novel non-volatile sparse masking self-attention (NVSM-SA) that utilizes a pre-computing sparse mask to capture contextual information from a larger window with no extra computation overload. Besides, a high-frequency reparameterization module (HRM) is proposed to make LIPT block reparameterization friendly, enhancing the model's ability to reconstruct fine details. Extensive experiments on multiple image processing tasks (e.g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR. LIPT achieves real-time GPU inference with state-of-the-art performance on multiple image SR benchmarks. The source codes are released at https://github.com/Lucien66/LIPT
AB - Transformer is leading a trend in the field of image processing. While existing lightweight image processing transformers have achieved notable success, they primarily focus on reducing FLOPs (floating-point operations) or the number of parameters, rather than on practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that substitutes memory-intensive operators with the combination of self-attention and convolutions to achieve practical speedup. Specifically, we propose a novel non-volatile sparse masking self-attention (NVSM-SA) that utilizes a pre-computing sparse mask to capture contextual information from a larger window with no extra computation overload. Besides, a high-frequency reparameterization module (HRM) is proposed to make LIPT block reparameterization friendly, enhancing the model's ability to reconstruct fine details. Extensive experiments on multiple image processing tasks (e.g., image super-resolution (SR), JPEG artifact reduction, and image denoising) demonstrate the superiority of LIPT on both latency and PSNR. LIPT achieves real-time GPU inference with state-of-the-art performance on multiple image SR benchmarks. The source codes are released at https://github.com/Lucien66/LIPT
KW - Image processing
KW - non-volatile sampling mask
KW - reparameterization
KW - transformer
UR - https://www.scopus.com/pages/publications/105005791792
U2 - 10.1109/TIP.2025.3567832
DO - 10.1109/TIP.2025.3567832
M3 - 文章
AN - SCOPUS:105005791792
SN - 1057-7149
VL - 34
SP - 3056
EP - 3069
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
ER -