跳到主要导航 跳到搜索 跳到主要内容

时序对齐视觉特征映射的音效生成方法

  • Zhifeng Xie
  • , Luoyi Sun
  • , Yuzhou Sun
  • , Chunpeng Yu
  • , Lizhuang Ma
  • Shanghai University
  • Shanghai Jiao Tong University

科研成果: 期刊稿件文章同行评审

摘要

In order to address the problems of existing methods, such as obvious noise, weak reality and asynchronous with video, we proposed a sound generation method based on timing-aligned visual feature mapping. Firstly, we designed a feature aggregation window based on temporal constraint, which extract integrated visual feature from the video sequence. Secondly, the integrated visual feature was transformed into multi-frequency audio feature by a spatio-temporal matching cross-modal mapping network. Finally, we utilized an audio decoder to obtain Mel-spectrogram from audio features, and send to a vocoder to output the final waveform. We completed qualitative and quantitative experiments on the VAS dataset, and the results show that the proposed method significantly improves audio quality, timing alignment, and audience perception.

投稿的翻译标题Sound Generation Method with Timing-Aligned Visual Feature Mapping
源语言繁体中文
页(从-至)1506-1514
页数9
期刊Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
34
10
DOI
出版状态已出版 - 10月 2022
已对外发布

关键词

  • auto-encoder
  • cross-modal
  • sound generation
  • timing alignment

指纹

探究 '时序对齐视觉特征映射的音效生成方法' 的科研主题。它们共同构成独一无二的指纹。

引用此