TY - GEN
T1 - STM-SalNet
T2 - 32nd International Conference on Neural Information Processing, ICONIP 2025
AU - Xu, Jikai
AU - Zhu, Dandan
AU - Zhang, Kaiwei
AU - Min, Xiongkuo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.
PY - 2026
Y1 - 2026
N2 - In recent years, video saliency prediction has attracted significant attention across a wide range of vision-related tasks. However, most existing video saliency prediction methods predominantly rely on static encoder-decoder architectures, failing to incorporate the dynamic memory mechanisms that are fundamental to human visual perception and attention modeling. To address this limitation, we propose STM-SalNet, a novel biologically-inspired spatial-temporal memory network for video saliency prediction. First, inspired by the powerful visual processing capabilities of the human visual cortex, we introduce a brain-inspired Vision Transformer module designed to extract multi-level hierarchical spatial-temporal features. Subsequently, we propose a memory bank module equipped with an active forgetting mechanism, simulating human memory’s ability to selectively retain and update information. By dynamically retrieving relevant features from past frames while discarding redundancy, the module ensures robust adaptability to continuously evolving video content. To further enhance the integration of spatial and temporal features, we design a bidirectional spatial-temporal fusion module that facilitates effective interaction between deep semantic and shallow spatial features, enriching the overall feature representation. Finally, a progressively hierarchical decoder module is employed to generate fine-grained, pixel-wise saliency maps that closely align with ground truths. Extensive experiments on the DHF1K, Hollywood-2, and UCF-Sports benchmark datasets demonstrate that our proposed STM-SalNet achieves competitive performance compared to existing state-of-the-art methods.
AB - In recent years, video saliency prediction has attracted significant attention across a wide range of vision-related tasks. However, most existing video saliency prediction methods predominantly rely on static encoder-decoder architectures, failing to incorporate the dynamic memory mechanisms that are fundamental to human visual perception and attention modeling. To address this limitation, we propose STM-SalNet, a novel biologically-inspired spatial-temporal memory network for video saliency prediction. First, inspired by the powerful visual processing capabilities of the human visual cortex, we introduce a brain-inspired Vision Transformer module designed to extract multi-level hierarchical spatial-temporal features. Subsequently, we propose a memory bank module equipped with an active forgetting mechanism, simulating human memory’s ability to selectively retain and update information. By dynamically retrieving relevant features from past frames while discarding redundancy, the module ensures robust adaptability to continuously evolving video content. To further enhance the integration of spatial and temporal features, we design a bidirectional spatial-temporal fusion module that facilitates effective interaction between deep semantic and shallow spatial features, enriching the overall feature representation. Finally, a progressively hierarchical decoder module is employed to generate fine-grained, pixel-wise saliency maps that closely align with ground truths. Extensive experiments on the DHF1K, Hollywood-2, and UCF-Sports benchmark datasets demonstrate that our proposed STM-SalNet achieves competitive performance compared to existing state-of-the-art methods.
KW - Active Forgetting
KW - Hippocampus
KW - Memory Bank
KW - Transformer
KW - Video Saliency Prediction
UR - https://www.scopus.com/pages/publications/105022867342
U2 - 10.1007/978-981-95-4097-6_23
DO - 10.1007/978-981-95-4097-6_23
M3 - 会议稿件
AN - SCOPUS:105022867342
SN - 9789819540969
T3 - Communications in Computer and Information Science
SP - 335
EP - 349
BT - Neural Information Processing - 32nd International Conference, ICONIP 2025, Proceedings
A2 - Taniguchi, Tadahiro
A2 - Leung, Chi Sing Andrew
A2 - Kozuno, Tadashi
A2 - Yoshimoto, Junichiro
A2 - Mahmud, Mufti
A2 - Doborjeh, Maryam
A2 - Doya, Kenji
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 20 November 2025 through 24 November 2025
ER -