TY - JOUR
T1 - 3DFaceShop
T2 - Explicitly Controllable 3D-Aware Portrait Generation
AU - Tang, Junshu
AU - Zhang, Bo
AU - Yang, Binxin
AU - Zhang, Ting
AU - Chen, Dong
AU - Ma, Lizhuang
AU - Wen, Fang
N1 - Publisher Copyright:
© 1995-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas when animating expressions. We solve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.
AB - In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas when animating expressions. We solve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.
KW - 3D morphable models
KW - 3D-aware GAN
KW - controllable 3 D portrait generation
KW - neural radiance field
UR - https://www.scopus.com/pages/publications/85174852571
U2 - 10.1109/TVCG.2023.3323578
DO - 10.1109/TVCG.2023.3323578
M3 - 文章
C2 - 37847635
AN - SCOPUS:85174852571
SN - 1077-2626
VL - 30
SP - 6020
EP - 6037
JO - IEEE Transactions on Visualization and Computer Graphics
JF - IEEE Transactions on Visualization and Computer Graphics
IS - 9
ER -