TY - GEN
T1 - AdaFormer
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Luo, Xiaotong
AU - Ai, Zekun
AU - Liang, Qiuyuan
AU - Liu, Ding
AU - Xie, Yuan
AU - Qu, Yanyun
AU - Fu, Yun
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertaintyguided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.
AB - Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertaintyguided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.
UR - https://www.scopus.com/pages/publications/85189551488
U2 - 10.1609/aaai.v38i5.28194
DO - 10.1609/aaai.v38i5.28194
M3 - 会议稿件
AN - SCOPUS:85189551488
T3 - Proceedings of the AAAI Conference on Artificial Intelligence
SP - 4009
EP - 4016
BT - Technical Tracks 14
A2 - Wooldridge, Michael
A2 - Dy, Jennifer
A2 - Natarajan, Sriraam
PB - Association for the Advancement of Artificial Intelligence
Y2 - 20 February 2024 through 27 February 2024
ER -