跳到主要导航 跳到搜索 跳到主要内容

基于结构化潜码引导 NeRF 的语音驱动人脸重演

  • Zhifeng Xie
  • , Jiaheng Zheng
  • , Ji Wang
  • , Jiajia Liang
  • , Lizhuang Ma
  • Shanghai University
  • Shanghai Jiao Tong University

科研成果: 期刊稿件文章同行评审

摘要

The goal of speech-driven facial reenactment aims to generate high-fidelity facial animation matching with the input speech content. However, existing methods can hardly achieve high-quality facial reenactment because of the gap between audio and video modals. In order to address the problems of existing methods such as low fidelity and poor lip sync effect, we propose a speech-driven facial reenactment method based on implicit neural representations with structured latent codes, which takes the point cloud sequence of human face as the intermediate representation, decomposing the speech-driven facial reenactment into two tasks: cross-modal mapping and neural radiance fields rendering. Firstly, we predict the facial expression coefficients through cross-modal mapping and get the facial identity coefficients by 3D face reconstruction; then, we synthesize face point cloud sequence based on 3DMM; next, we use the position of vertices constructing the structured implicit neural representations and regress density and color for each sampling points; finally, we render RGB frames of human face through volume rendering techniques and assemble them into original image. Experiments results on multiple 3—5 min individual speech videos, including visual comparison, quantitative evaluation, and subjective assessment demonstrate that our method achieves better results than state-of-the-art methods such as AD-NeRF in terms of lip-sync accuracy and image generation precision, which can achieve high-fidelity speech-driven facial reenactment.

投稿的翻译标题Speech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes
源语言繁体中文
页(从-至)1616-1624
页数9
期刊Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
36
10
DOI
出版状态已出版 - 10月 2024
已对外发布

关键词

  • audio-driven facial reenactment
  • cross-modal
  • implicit neural representations
  • neural radiance field(NeRF)

指纹

探究 '基于结构化潜码引导 NeRF 的语音驱动人脸重演' 的科研主题。它们共同构成独一无二的指纹。

引用此