跳到主要导航 跳到搜索 跳到主要内容

STM-SalNet: A Biologically-Inspired Spatial-Temporal Memory Network for Video Saliency Prediction

  • Jikai Xu
  • , Dandan Zhu*
  • , Kaiwei Zhang
  • , Xiongkuo Min
  • *此作品的通讯作者
  • East China Normal University
  • Shanghai Jiao Tong University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

In recent years, video saliency prediction has attracted significant attention across a wide range of vision-related tasks. However, most existing video saliency prediction methods predominantly rely on static encoder-decoder architectures, failing to incorporate the dynamic memory mechanisms that are fundamental to human visual perception and attention modeling. To address this limitation, we propose STM-SalNet, a novel biologically-inspired spatial-temporal memory network for video saliency prediction. First, inspired by the powerful visual processing capabilities of the human visual cortex, we introduce a brain-inspired Vision Transformer module designed to extract multi-level hierarchical spatial-temporal features. Subsequently, we propose a memory bank module equipped with an active forgetting mechanism, simulating human memory’s ability to selectively retain and update information. By dynamically retrieving relevant features from past frames while discarding redundancy, the module ensures robust adaptability to continuously evolving video content. To further enhance the integration of spatial and temporal features, we design a bidirectional spatial-temporal fusion module that facilitates effective interaction between deep semantic and shallow spatial features, enriching the overall feature representation. Finally, a progressively hierarchical decoder module is employed to generate fine-grained, pixel-wise saliency maps that closely align with ground truths. Extensive experiments on the DHF1K, Hollywood-2, and UCF-Sports benchmark datasets demonstrate that our proposed STM-SalNet achieves competitive performance compared to existing state-of-the-art methods.

源语言英语
主期刊名Neural Information Processing - 32nd International Conference, ICONIP 2025, Proceedings
编辑Tadahiro Taniguchi, Chi Sing Andrew Leung, Tadashi Kozuno, Junichiro Yoshimoto, Mufti Mahmud, Maryam Doborjeh, Kenji Doya
出版商Springer Science and Business Media Deutschland GmbH
335-349
页数15
ISBN(印刷版)9789819540969
DOI
出版状态已出版 - 2026
活动32nd International Conference on Neural Information Processing, ICONIP 2025 - Okinawa, 日本
期限: 20 11月 202524 11月 2025

出版系列

姓名Communications in Computer and Information Science
2756 CCIS
ISSN(印刷版)1865-0929
ISSN(电子版)1865-0937

会议

会议32nd International Conference on Neural Information Processing, ICONIP 2025
国家/地区日本
Okinawa
时期20/11/2524/11/25

指纹

探究 'STM-SalNet: A Biologically-Inspired Spatial-Temporal Memory Network for Video Saliency Prediction' 的科研主题。它们共同构成独一无二的指纹。

引用此