TY - GEN
T1 - Compacter
T2 - 32nd ACM International Conference on Multimedia, MM 2024
AU - Wu, Zhijian
AU - Li, Jun
AU - Hu, Yang
AU - Huang, Dingjiang
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/28
Y1 - 2024/10/28
N2 - Although deep learning-based methods have made significant advances in the field of image restoration (IR), they often suffer from excessive model parameters. To tackle this problem, this work proposes a compact Transformer (Compacter) for lightweight image restoration by making several key designs. We employ the concepts of projection sharing, adaptive interaction, and heterogeneous aggregation to develop a novel Compact Adaptive Self-Attention (CASA). Specifically, CASA utilizes shared projection to generate Query, Key, and Value to simultaneously model spatial and channel-wise self-attention. The adaptive interaction process is then used to propagate and integrate global information from two different dimensions, thus enabling omnidirectional relational interaction. Finally, a depth-wise convolution is incorporated on Value to complement heterogeneous local information, enabling global-local coupling. Moreover, we propose a Dual Selective Gated Module (DSGM) to dynamically encapsulate the globality into each pixel for context-adaptive aggregation. Extensive experiments demonstrate that our Compacter achieves state-of-the-art performance for a variety of lightweight IR tasks with approximately 400K parameters.
AB - Although deep learning-based methods have made significant advances in the field of image restoration (IR), they often suffer from excessive model parameters. To tackle this problem, this work proposes a compact Transformer (Compacter) for lightweight image restoration by making several key designs. We employ the concepts of projection sharing, adaptive interaction, and heterogeneous aggregation to develop a novel Compact Adaptive Self-Attention (CASA). Specifically, CASA utilizes shared projection to generate Query, Key, and Value to simultaneously model spatial and channel-wise self-attention. The adaptive interaction process is then used to propagate and integrate global information from two different dimensions, thus enabling omnidirectional relational interaction. Finally, a depth-wise convolution is incorporated on Value to complement heterogeneous local information, enabling global-local coupling. Moreover, we propose a Dual Selective Gated Module (DSGM) to dynamically encapsulate the globality into each pixel for context-adaptive aggregation. Extensive experiments demonstrate that our Compacter achieves state-of-the-art performance for a variety of lightweight IR tasks with approximately 400K parameters.
KW - deep learning
KW - lightweight image restoration
KW - self-attention
UR - https://www.scopus.com/pages/publications/85209822205
U2 - 10.1145/3664647.3680811
DO - 10.1145/3664647.3680811
M3 - 会议稿件
AN - SCOPUS:85209822205
T3 - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
SP - 3094
EP - 3103
BT - MM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PB - Association for Computing Machinery, Inc
Y2 - 28 October 2024 through 1 November 2024
ER -