基于结构化潜码引导 NeRF 的语音驱动人脸重演

Translated title of the contribution: Speech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes

Zhifeng Xie, Jiaheng Zheng, Ji Wang, Jiajia Liang, Lizhuang Ma

Research output: Contribution to journalArticlepeer-review

Abstract

The goal of speech-driven facial reenactment aims to generate high-fidelity facial animation matching with the input speech content. However, existing methods can hardly achieve high-quality facial reenactment because of the gap between audio and video modals. In order to address the problems of existing methods such as low fidelity and poor lip sync effect, we propose a speech-driven facial reenactment method based on implicit neural representations with structured latent codes, which takes the point cloud sequence of human face as the intermediate representation, decomposing the speech-driven facial reenactment into two tasks: cross-modal mapping and neural radiance fields rendering. Firstly, we predict the facial expression coefficients through cross-modal mapping and get the facial identity coefficients by 3D face reconstruction; then, we synthesize face point cloud sequence based on 3DMM; next, we use the position of vertices constructing the structured implicit neural representations and regress density and color for each sampling points; finally, we render RGB frames of human face through volume rendering techniques and assemble them into original image. Experiments results on multiple 3—5 min individual speech videos, including visual comparison, quantitative evaluation, and subjective assessment demonstrate that our method achieves better results than state-of-the-art methods such as AD-NeRF in terms of lip-sync accuracy and image generation precision, which can achieve high-fidelity speech-driven facial reenactment.

Translated title of the contributionSpeech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes
Original languageChinese (Traditional)
Pages (from-to)1616-1624
Number of pages9
JournalJisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics
Volume36
Issue number10
DOIs
StatePublished - Oct 2024
Externally publishedYes

Fingerprint

Dive into the research topics of 'Speech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes'. Together they form a unique fingerprint.

Cite this