AnyStyleDiffusion: Flexible Style Transfer with Consistent Content Adaptation Across Diffusion Models

  • Zhenyu Xu
  • , Junjie Wu
  • , Zhiyan Piao
  • , Xiaoqi Sheng
  • , Yu Xiao
  • , Xinyu Zhang*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in text-to-image diffusion models have demonstrated remarkable capabilities in generating high-quality visual content with style and feature controlled. A fundamental challenge remains in simultaneously maintaining three critical properties of generated image sequences: (1) fine-grained style control, (2) strict image-prompt alignment, and (3) cross-image content coherence. To overcome the challenge, we leverage AnyStyleDiffusion to overcome the challenge. Specifically, we interpret any artistic style required by users on generated image as a feature in models' weight space. Interpolation between weight space obtains models expressing middle styles with linear transition. Hyper-receptive Motion Layers is proposed to align outputs of diverse weight spaces, operating as adaptive style modulators. These HRMLs are separated from interpolated diffusion models, leveraging zero-shot compatibility with existing model checkpoints. By employing Homogeneous Stable Diffusion, direct interpolation on weight space is avoided to improve synthesis efficiency. Comprehensive evaluations across personalized models demonstrate our method's superiority in generating content-coherent sequences with dynamic style transformations. Code will be released at https://github.com/shermandozer/AnyStyleDiffusion.git.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages9519-9528
Number of pages10
ISBN (Electronic)9798400720352
DOIs
StatePublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Keywords

  • controllable generation
  • diffusion model
  • style transfer
  • text-to-image synthesis

Fingerprint

Dive into the research topics of 'AnyStyleDiffusion: Flexible Style Transfer with Consistent Content Adaptation Across Diffusion Models'. Together they form a unique fingerprint.

Cite this