The ability to create high-quality 3D faces from a single image has become increasingly important with wide applications in video conferencing, AR/VR, and advanced video editing in movie industries. In this paper, we propose Face Diffusion NeRF (FDNeRF), a new generative method to reconstruct high-quality Face NeRFs from single images, complete with semantic editing and relighting capabilities. FDNeRF utilizes high-resolution 3D GAN inversion and expertly trained 2D latent-diffusion model, allowing users to manipulate and construct Face NeRFs in zero-shot learning without the need for explicit 3D data. With carefully designed illumination and identity preserving loss, as well as multi-modal pre-training, FD-NeRF offers users unparalleled control over the editing process enabling them to create and edit face NeRFs using just single-view images, text prompts, and explicit target lighting. The advanced features of FDNeRF have been designed to produce more impressive results than existing 2D editing approaches that rely on 2D segmentation maps for editable attributes. Experiments show that our FDNeRF achieves exceptionally realistic results and unprecedented flexibility in editing compared with state-of-the-art 3D face reconstruction and editing methods. Our code will be available at https://github.com/BillyXYB/FDNeRF.
翻译:从单张图像生成高质量三维人脸的能力在视频会议、增强现实/虚拟现实及电影工业的高级视频编辑中日益重要。本文提出Face Diffusion NeRF(FDNeRF),一种从单张图像重建高质量人脸NeRF的新生成式方法,同时具备语义编辑与重光照能力。FDNeRF利用高分辨率3D GAN反演及专业训练的二维潜在扩散模型,无需显式三维数据即可通过零样本学习操控与构建人脸NeRF。通过精心设计的照明与身份保持损失函数及多模态预训练,FDNeRF为用户提供前所未有的编辑控制能力,使其仅凭单视角图像、文本提示及显式目标光照即可创建和编辑人脸NeRF。相比依赖二维分割图进行属性编辑的现有二维编辑方法,FDNeRF的先进特性可生成更令人印象深刻的结果。实验表明,相较于最先进的三维人脸重建与编辑方法,我们的FDNeRF在编辑过程中实现了极为逼真的效果与空前的灵活性。代码将发布于https://github.com/BillyXYB/FDNeRF。