跳到主要导航 跳到搜索 跳到主要内容

From Discrete Representation to Continuous Modeling: A Novel Audio-Visual Saliency Prediction Model With Implicit Neural Representations

  • Dandan Zhu
  • , Kaiwei Zhang
  • , Kun Zhu*
  • , Nana Zhang*
  • , Weiping Ding*
  • , Guangtao Zhai
  • , Xiaokang Yang
  • *此作品的通讯作者
  • Shanghai Jiao Tong University
  • Ministry of Education of the People's Republic of China
  • Tongji University
  • Donghua University
  • Nantong University

科研成果: 期刊稿件文章同行评审

摘要

In the era of deep learning, audio-visual saliency prediction is still in its infancy due to the complexity of video signals and the continuous correlation in the temporal dimension. Most existing approaches treat videos as 3D grids of RGB values and model them using discrete neural networks, leading to issues such as video content-agnostic and sub-optimal feature representation ability. To address these challenges, we propose a novel dynamic-aware audio-visual saliency (DAVS) model based on implicit neural representations (INRs). The core of our proposed DAVS model is to build an effective mapping by exploiting a parametric neural network that maps space-time coordinates to the corresponding saliency values. Specifically, our model incorporates an INR-based video generator that decomposes videos into image, motion, and audio feature vectors, learning video content-adaptive features via a parametric neural network. This generator efficiently encodes videos, naturally models continuous temporal dynamics, and enhances feature representation capability. Furthermore, we introduce a parametric audio-visual feature fusion strategy in the saliency prediction procedure, enabling intrinsic interactions between modalities and adaptively integrating visual and audio cues. Through extensive experiments on benchmark datasets, our proposed DAVS model demonstrates promising performance and intriguing properties in audio-visual saliency prediction.

源语言英语
页(从-至)4059-4074
页数16
期刊IEEE Transactions on Emerging Topics in Computational Intelligence
8
6
DOI
出版状态已出版 - 2024

指纹

探究 'From Discrete Representation to Continuous Modeling: A Novel Audio-Visual Saliency Prediction Model With Implicit Neural Representations' 的科研主题。它们共同构成独一无二的指纹。

引用此