3D facial avatar reconstruction has been a significant research topic in computer graphics and computer vision, where photo-realistic rendering and flexible controls over poses and expressions are necessary for many related applications. Recently, its performance has been greatly improved with the development of neural radiance fields (NeRF). However, most existing NeRF-based facial avatars focus on subject-specific reconstruction and reenactment, requiring multi-shot images containing different views of the specific subject for training, and the learned model cannot generalize to new identities, limiting its further applications. In this work, we propose a one-shot 3D facial avatar reconstruction framework that only requires a single source image to reconstruct a high-fidelity 3D facial avatar. For the challenges of lacking generalization ability and missing multi-view information, we leverage the generative prior of 3D GAN and develop an efficient encoder-decoder network to reconstruct the canonical neural volume of the source image, and further propose a compensation network to complement facial details. To enable fine-grained control over facial dynamics, we propose a deformation field to warp the canonical volume into driven expressions. Through extensive experimental comparisons, we achieve superior synthesis results compared to several state-of-the-art methods.
翻译:摘要:三维人脸面部头像重建一直是计算机图形学与计算机视觉领域的重要研究课题,在诸多相关应用中,逼真的渲染效果以及姿态与表情的灵活控制必不可少。近年来,随着神经辐射场(NeRF)的发展,其性能得到了显著提升。然而,现有基于NeRF的人脸面部头像大多专注于特定对象的重建与再现,需要包含该对象不同视角的多张图像进行训练,且学习到的模型无法泛化至新的人脸身份,这限制了其进一步应用。在本工作中,我们提出了一种一次性的三维人脸面部头像重建框架,仅需单张源图像即可重建高保真的三维人脸面部头像。针对泛化能力不足及缺少多视角信息的挑战,我们利用三维生成对抗网络(3D GAN)的生成先验,开发了一种高效的编码器-解码器网络来重建源图像的规范神经体积,并进一步提出补偿网络以补充面部细节。为实现对面部动态的精细控制,我们提出了一种变形场,将规范体积变形至驱动表情状态。通过大量实验对比,与若干现有最先进方法相比,我们取得了更优的合成结果。