The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries. In this paper, we propose Face Diffusion NeRF (FaceDNeRF), a new generative method to reconstruct high-quality Face NeRFs from single images, complete with semantic editing and relighting capabilities. FaceDNeRF utilizes high-resolution 3D GAN inversion and expertly trained 2D latent-diffusion model, allowing users to manipulate and construct Face NeRFs in zero-shot learning without the need for explicit 3D data. With carefully designed illumination and identity preserving loss, as well as multi-modal pre-training, FaceDNeRF offers users unparalleled control over the editing process enabling them to create and edit face NeRFs using just single-view images, text prompts, and explicit target lighting. The advanced features of FaceDNeRF have been designed to produce more impressive results than existing 2D editing approaches that rely on 2D segmentation maps for editable attributes. Experiments show that our FaceDNeRF achieves exceptionally realistic results and unprecedented flexibility in editing compared with state-of-the-art 3D face reconstruction and editing methods. Our code will be available at https://github.com/BillyXYB/FaceDNeRF.
翻译:从单张图像生成高质量三维人脸的能力在视频会议、增强现实/虚拟现实(AR/VR)以及电影行业的高级视频编辑中日益重要。本文提出Face Diffusion NeRF(FaceDNeRF)——一种新的生成方法,能够从单张图像重建高质量的人脸NeRF,并具备语义编辑和重光照功能。FaceDNeRF利用高分辨率3D GAN反演与经过专业训练的二维潜在扩散模型,使用户能够在无需显式三维数据的情况下,通过零样本学习操作和构建人脸NeRF。通过精心设计的照度和身份保持损失函数以及多模态预训练,FaceDNeRF为用户提供了前所未有的编辑控制能力,使其仅需单视角图像、文本提示和显式目标光照即可创建和编辑人脸NeRF。相比依赖二维分割图进行可编辑属性的现有二维编辑方法,FaceDNeRF的高级特性能够产生更令人印象深刻的效果。实验表明,与最先进的三维人脸重建与编辑方法相比,我们的FaceDNeRF在编辑中实现了极其逼真的效果和前所未有的灵活性。我们的代码将开源在https://github.com/BillyXYB/FaceDNeRF。