时序对齐视觉特征映射的音效生成方法

Translated title of the contribution: Sound Generation Method with Timing-Aligned Visual Feature Mapping

Zhifeng Xie, Luoyi Sun, Yuzhou Sun, Chunpeng Yu, Lizhuang Ma

Research output: Contribution to journalArticlepeer-review

Abstract

In order to address the problems of existing methods, such as obvious noise, weak reality and asynchronous with video, we proposed a sound generation method based on timing-aligned visual feature mapping. Firstly, we designed a feature aggregation window based on temporal constraint, which extract integrated visual feature from the video sequence. Secondly, the integrated visual feature was transformed into multi-frequency audio feature by a spatio-temporal matching cross-modal mapping network. Finally, we utilized an audio decoder to obtain Mel-spectrogram from audio features, and send to a vocoder to output the final waveform. We completed qualitative and quantitative experiments on the VAS dataset, and the results show that the proposed method significantly improves audio quality, timing alignment, and audience perception.

Translated title of the contributionSound Generation Method with Timing-Aligned Visual Feature Mapping
Original languageChinese (Traditional)
Pages (from-to)1506-1514
Number of pages9
JournalJisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
Volume34
Issue number10
DOIs
StatePublished - Oct 2022
Externally publishedYes

Fingerprint

Dive into the research topics of 'Sound Generation Method with Timing-Aligned Visual Feature Mapping'. Together they form a unique fingerprint.

Cite this