跳到主要导航 跳到搜索 跳到主要内容

TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images

  • Nana Zhang
  • , Min Xiong
  • , Dandan Zhu*
  • , Kun Zhu*
  • , Guangtao Zhai
  • *此作品的通讯作者
  • Donghua University
  • Nanjing University
  • Tongji University
  • Shanghai Jiao Tong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Existing visual saliency prediction methods mainly focus on single-modal visual saliency prediction, while ignoring the significant impact of text on visual saliency. To more comprehensively explore the influence of text on human attention in images, we propose a text-guided diffusion saliency prediction model, named TDiffSal. In specific, recent studies on stable diffusion models have shown promising performance in unifying tasks due to their inherent generalization ability. Inspired by this, a novel diffusion model for generalized visual-text saliency prediction is proposed, which formulates the prediction issue as a conditional generative task of the saliency map by employing input visual and text as the conditions. Meanwhile, we introduce a multi-head fusion module to effectively integrate text features and image features, which can efficiently guide the image denoising process and progressively refine the generated saliency map to make it semantically relevant to the text. Additionally, we employ an efficient pre-training strategy to enhance the robustness and generalization of the proposed model. We conduct extensive experiments on benchmark datasets to demonstrate its superior performance compared to other state-of-the-art methods.

源语言英语
主期刊名Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
编辑Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal
出版商Springer Science and Business Media Deutschland GmbH
15-31
页数17
ISBN(印刷版)9783031781858
DOI
出版状态已出版 - 2025
活动27th International Conference on Pattern Recognition, ICPR 2024 - Kolkata, 印度
期限: 1 12月 20245 12月 2024

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
15308 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议27th International Conference on Pattern Recognition, ICPR 2024
国家/地区印度
Kolkata
时期1/12/245/12/24

指纹

探究 'TDiffSal: Text-Guided Diffusion Saliency Prediction Model for Images' 的科研主题。它们共同构成独一无二的指纹。

引用此