跳到主要导航 跳到搜索 跳到主要内容

LAVS: A LIGHTWEIGHT AUDIO-VISUAL SALIENCY PREDICTION MODEL

  • Dandan Zhu
  • , Defang Zhao*
  • , Xiongkuo Min
  • , Tian Han
  • , Qiangqiang Zhou
  • , Shaobo Yu
  • , Yongqing Chen
  • , Guangtao Zhai
  • , Xiaokang Yang
  • *此作品的通讯作者
  • Shanghai Jiao Tong University
  • Tongji University
  • Stevens Institute of Technology
  • Jiangxi Normal University
  • East China Normal University
  • Hainan University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Audio information is essential for guiding human attention and visual perception, which has been verified by many comprehensive psychological studies. However, the audio modality has been rather neglected in modeling visual attention, most of the current visual attention models heavily depend on visual information. Additionally, current existing high-performing visual attention models rely on deeper convolution neural networks (CNNs), benefiting from their extraordinary feature learning ability but incurring high computational cost. To this end, we propose a novel lightweight audio-visual saliency (LAVS) model to efficiently address the problem of fixation prediction in videos. To the best of our knowledge, our proposed model constitutes the first attempt to exploit a lightweight network and combines the visual and audio cues to perform saliency estimation in videos. Specifically, our proposed model consists of four modules, which are spatial-temporal visual saliency estimation module, audio features extraction module, source sound localization module, and audio-visual saliency fusion module. Extensive experiments across datasets validate the effectiveness and real-time performance of the proposed LAVS model, which outperforms the other state-of-the-art methods.

源语言英语
主期刊名2021 IEEE International Conference on Multimedia and Expo, ICME 2021
出版商IEEE Computer Society
ISBN(电子版)9781665438643
DOI
出版状态已出版 - 2021
已对外发布
活动2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, 中国
期限: 5 7月 20219 7月 2021

出版系列

姓名Proceedings - IEEE International Conference on Multimedia and Expo
ISSN(印刷版)1945-7871
ISSN(电子版)1945-788X

会议

会议2021 IEEE International Conference on Multimedia and Expo, ICME 2021
国家/地区中国
Shenzhen
时期5/07/219/07/21

指纹

探究 'LAVS: A LIGHTWEIGHT AUDIO-VISUAL SALIENCY PREDICTION MODEL' 的科研主题。它们共同构成独一无二的指纹。

引用此