摘要
The goal of speech-driven facial reenactment aims to generate high-fidelity facial animation matching with the input speech content. However, existing methods can hardly achieve high-quality facial reenactment because of the gap between audio and video modals. In order to address the problems of existing methods such as low fidelity and poor lip sync effect, we propose a speech-driven facial reenactment method based on implicit neural representations with structured latent codes, which takes the point cloud sequence of human face as the intermediate representation, decomposing the speech-driven facial reenactment into two tasks: cross-modal mapping and neural radiance fields rendering. Firstly, we predict the facial expression coefficients through cross-modal mapping and get the facial identity coefficients by 3D face reconstruction; then, we synthesize face point cloud sequence based on 3DMM; next, we use the position of vertices constructing the structured implicit neural representations and regress density and color for each sampling points; finally, we render RGB frames of human face through volume rendering techniques and assemble them into original image. Experiments results on multiple 3—5 min individual speech videos, including visual comparison, quantitative evaluation, and subjective assessment demonstrate that our method achieves better results than state-of-the-art methods such as AD-NeRF in terms of lip-sync accuracy and image generation precision, which can achieve high-fidelity speech-driven facial reenactment.
| 投稿的翻译标题 | Speech-Driven Facial Reenactment Based on Implicit Neural Representations with Structured Latent Codes |
|---|---|
| 源语言 | 繁体中文 |
| 页(从-至) | 1616-1624 |
| 页数 | 9 |
| 期刊 | Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics |
| 卷 | 36 |
| 期 | 10 |
| DOI | |
| 出版状态 | 已出版 - 10月 2024 |
| 已对外发布 | 是 |
关键词
- audio-driven facial reenactment
- cross-modal
- implicit neural representations
- neural radiance field(NeRF)
指纹
探究 '基于结构化潜码引导 NeRF 的语音驱动人脸重演' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver