TY - JOUR
T1 - Brand-new speech animation technology based on first order motion model and MelGAN-VC
AU - Chen, Shaomin
AU - Gao, Xinyi
AU - Wang, Jiangtao
AU - Xiao, Yu
AU - Zhang, Yueling
AU - Xu, Gang
N1 - Publisher Copyright:
© 2021 Institute of Physics Publishing. All rights reserved.
PY - 2021/3/4
Y1 - 2021/3/4
N2 - Speech animation has huge application potential in instant messaging and entertainment media fields such as videophones, virtual meetings, audio and video chats. The traditional voice-driven speech animation has the problem of a single adaptation language, and the performance-driven speech animation has the problem of high cost of capture equipment and difficult mass production. Based on the above existing problems, we propose a new method of speech animation generation, that is, given a static portrait of a person and a face-driven video, finally generate a face animation video of the character in the given portrait. The conversion system consists of two parts: face conversion and voice conversion. We noticed that the final generated face animation video has problems such as low definition, not smooth playback, and metallic sound. On this basis, this article proposes to increase the animation enhancement experiment and replace the encoder measures for improvement. Through comparative experiments, the above measures are proved to be effective.
AB - Speech animation has huge application potential in instant messaging and entertainment media fields such as videophones, virtual meetings, audio and video chats. The traditional voice-driven speech animation has the problem of a single adaptation language, and the performance-driven speech animation has the problem of high cost of capture equipment and difficult mass production. Based on the above existing problems, we propose a new method of speech animation generation, that is, given a static portrait of a person and a face-driven video, finally generate a face animation video of the character in the given portrait. The conversion system consists of two parts: face conversion and voice conversion. We noticed that the final generated face animation video has problems such as low definition, not smooth playback, and metallic sound. On this basis, this article proposes to increase the animation enhancement experiment and replace the encoder measures for improvement. Through comparative experiments, the above measures are proved to be effective.
UR - https://www.scopus.com/pages/publications/85103285730
U2 - 10.1088/1742-6596/1828/1/012029
DO - 10.1088/1742-6596/1828/1/012029
M3 - 会议文章
AN - SCOPUS:85103285730
SN - 1742-6588
VL - 1828
JO - Journal of Physics: Conference Series
JF - Journal of Physics: Conference Series
IS - 1
M1 - 012029
T2 - 2020 International Symposium on Automation, Information and Computing, ISAIC 2020
Y2 - 2 December 2020 through 4 December 2020
ER -