DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models

Miaomiao Dai, Qianyu Zhou, Ran Yi, Lizhuang Ma

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Pre-trained text-to-image (T2I) synthesis diffusion models (DM) have shown remarkable capabilities in generating diverse images. However, they struggle to satisfy the user's requirements due to (i) text's inherent imprecision in expressing specific styles and (ii) generation is time-consuming due to many iterations in reverse process of diffusion models. To address these issues, we propose a fast style transfer method adopting pre-trained large-scale diffusion models, dubbed as DiffuseFIST, which adds T-small (300) noise to accelerate reverse process and solely requires real-world images and artistic images as input. Specifically, to preserve content and prevent style leakage, we introduce Content Injection (CI) strategy to achieve fine-grained control over the generated structure by manipulating spatial features and self-attention inside the model. Furthermore, we design Iterative Style Guidance (ISG) strategy which allows explicit user guidance and control of stylization tradeoffs. Finally, we initialize latent variable with Whitening and Coloring Transform (WCT) to deal with the disharmonious color. Qualitative and quantitative experiments demonstrate that our proposed method surpasses state-of-the-art methods in both conventional and diffusion-based style transfer methods.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Proceedings
EditorsBhaskar D Rao, Isabel Trancoso, Gaurav Sharma, Neelesh B. Mehta
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350368741
DOIs
StatePublished - 2025
Externally publishedYes
Event2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025 - Hyderabad, India
Duration: 6 Apr 202511 Apr 2025

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025
Country/TerritoryIndia
CityHyderabad
Period6/04/2511/04/25

Keywords

  • content generation
  • diffusion models
  • style transfer

Fingerprint

Dive into the research topics of 'DiffuseFIST: A Fast Image-guided Style Transfer Method for Adapting Large-scale Diffusion Models'. Together they form a unique fingerprint.

Cite this