TY - GEN
T1 - StyleRWKV
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
AU - Dai, Miaomiao
AU - Zhou, Qianyu
AU - Ma, Lizhuang
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Style transfer aims to generate a new image preserving the content but with the artistic representation of the style source. Most of the existing methods are based on Transformers or diffusion models, however, they suffer from quadratic computational complexity and high inference time. RWKV, as an emerging deep sequence models, has shown immense potential for long-context sequence modeling in NLP tasks. In this work, we present a novel framework StyleRWKV, to achieve high-quality style transfer with limited memory usage and linear time complexity. Specifically, we propose a Recurrent WKV (Re-WKV) attention mechanism, which incorporates bidirectional attention to establish a global receptive field. Additionally, we develop a Deformable Shifting (Deform-Shifting) layer that introduces learnable offsets to the sampling grid of the convolution kernel, allowing tokens to shift flexibly and adaptively from the region of interest, thereby enhancing the model's ability to capture local dependencies. Finally, we propose a Skip Scanning (S-Scanning) method that effectively establishes global contextual dependencies. Extensive experiments with analysis including qualitative and quantitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of stylization quality, model complexity, and inference efficiency.
AB - Style transfer aims to generate a new image preserving the content but with the artistic representation of the style source. Most of the existing methods are based on Transformers or diffusion models, however, they suffer from quadratic computational complexity and high inference time. RWKV, as an emerging deep sequence models, has shown immense potential for long-context sequence modeling in NLP tasks. In this work, we present a novel framework StyleRWKV, to achieve high-quality style transfer with limited memory usage and linear time complexity. Specifically, we propose a Recurrent WKV (Re-WKV) attention mechanism, which incorporates bidirectional attention to establish a global receptive field. Additionally, we develop a Deformable Shifting (Deform-Shifting) layer that introduces learnable offsets to the sampling grid of the convolution kernel, allowing tokens to shift flexibly and adaptively from the region of interest, thereby enhancing the model's ability to capture local dependencies. Finally, we propose a Skip Scanning (S-Scanning) method that effectively establishes global contextual dependencies. Extensive experiments with analysis including qualitative and quantitative evaluations demonstrate that our approach outperforms state-of-the-art methods in terms of stylization quality, model complexity, and inference efficiency.
KW - Linear complexity
KW - RWKV
KW - Style transfer
UR - https://www.scopus.com/pages/publications/105022631282
U2 - 10.1109/ICME59968.2025.11210137
DO - 10.1109/ICME59968.2025.11210137
M3 - 会议稿件
AN - SCOPUS:105022631282
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
Y2 - 30 June 2025 through 4 July 2025
ER -