TY - GEN
T1 - AnyStyleDiffusion
T2 - 33rd ACM International Conference on Multimedia, MM 2025
AU - Xu, Zhenyu
AU - Wu, Junjie
AU - Piao, Zhiyan
AU - Sheng, Xiaoqi
AU - Xiao, Yu
AU - Zhang, Xinyu
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/10/27
Y1 - 2025/10/27
N2 - Recent advances in text-to-image diffusion models have demonstrated remarkable capabilities in generating high-quality visual content with style and feature controlled. A fundamental challenge remains in simultaneously maintaining three critical properties of generated image sequences: (1) fine-grained style control, (2) strict image-prompt alignment, and (3) cross-image content coherence. To overcome the challenge, we leverage AnyStyleDiffusion to overcome the challenge. Specifically, we interpret any artistic style required by users on generated image as a feature in models' weight space. Interpolation between weight space obtains models expressing middle styles with linear transition. Hyper-receptive Motion Layers is proposed to align outputs of diverse weight spaces, operating as adaptive style modulators. These HRMLs are separated from interpolated diffusion models, leveraging zero-shot compatibility with existing model checkpoints. By employing Homogeneous Stable Diffusion, direct interpolation on weight space is avoided to improve synthesis efficiency. Comprehensive evaluations across personalized models demonstrate our method's superiority in generating content-coherent sequences with dynamic style transformations. Code will be released at https://github.com/shermandozer/AnyStyleDiffusion.git.
AB - Recent advances in text-to-image diffusion models have demonstrated remarkable capabilities in generating high-quality visual content with style and feature controlled. A fundamental challenge remains in simultaneously maintaining three critical properties of generated image sequences: (1) fine-grained style control, (2) strict image-prompt alignment, and (3) cross-image content coherence. To overcome the challenge, we leverage AnyStyleDiffusion to overcome the challenge. Specifically, we interpret any artistic style required by users on generated image as a feature in models' weight space. Interpolation between weight space obtains models expressing middle styles with linear transition. Hyper-receptive Motion Layers is proposed to align outputs of diverse weight spaces, operating as adaptive style modulators. These HRMLs are separated from interpolated diffusion models, leveraging zero-shot compatibility with existing model checkpoints. By employing Homogeneous Stable Diffusion, direct interpolation on weight space is avoided to improve synthesis efficiency. Comprehensive evaluations across personalized models demonstrate our method's superiority in generating content-coherent sequences with dynamic style transformations. Code will be released at https://github.com/shermandozer/AnyStyleDiffusion.git.
KW - controllable generation
KW - diffusion model
KW - style transfer
KW - text-to-image synthesis
UR - https://www.scopus.com/pages/publications/105024067156
U2 - 10.1145/3746027.3754942
DO - 10.1145/3746027.3754942
M3 - 会议稿件
AN - SCOPUS:105024067156
T3 - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
SP - 9519
EP - 9528
BT - MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PB - Association for Computing Machinery, Inc
Y2 - 27 October 2025 through 31 October 2025
ER -