TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images

  • Nana Zhang
  • , Min Xiong
  • , Dandan Zhu*
  • , Kun Zhu*
  • , Guangtao Zhai
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Existing visual saliency prediction methods mainly focus on single-modal visual saliency prediction, while ignoring the significant impact of text on visual saliency. To more comprehensively explore the influence of text on human attention in images, we propose a text-guided diffusion saliency prediction model, named TDiffSal. In specific, recent studies on stable diffusion models have shown promising performance in unifying tasks due to their inherent generalization ability. Inspired by this, a novel diffusion model for generalized visual-text saliency prediction is proposed, which formulates the prediction issue as a conditional generative task of the saliency map by employing input visual and text as the conditions. Meanwhile, we introduce a multi-head fusion module to effectively integrate text features and image features, which can efficiently guide the image denoising process and progressively refine the generated saliency map to make it semantically relevant to the text. Additionally, we employ an efficient pre-training strategy to enhance the robustness and generalization of the proposed model. We conduct extensive experiments on benchmark datasets to demonstrate its superior performance compared to other state-of-the-art methods.

Original languageEnglish
Title of host publicationPattern Recognition - 27th International Conference, ICPR 2024, Proceedings
EditorsApostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
PublisherSpringer Science and Business Media Deutschland GmbH
Pages15-31
Number of pages17
ISBN (Print)9783031781858
DOIs
StatePublished - 2025
Event27th International Conference on Pattern Recognition, ICPR 2024 - Kolkata, India
Duration: 1 Dec 20245 Dec 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15308 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference27th International Conference on Pattern Recognition, ICPR 2024
Country/TerritoryIndia
CityKolkata
Period1/12/245/12/24

Keywords

  • Stable diffusion
  • feature fusion
  • multimodal
  • saliency prediction
  • text-guided visual saliency

Fingerprint

Dive into the research topics of 'TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images'. Together they form a unique fingerprint.

Cite this