Recent advances in Neural Radiance Fields (NeRFs) have made it possible to reconstruct and reanimate dynamic portrait scenes with control over head-pose, facial expressions and viewing direction. However, training such models assumes photometric consistency over the deformed region e.g. the face must be evenly lit as it deforms with changing head-pose and facial expression. Such photometric consistency across frames of a video is hard to maintain, even in studio environments, thus making the created reanimatable neural portraits prone to artifacts during reanimation. In this work, we propose CoDyNeRF, a system that enables the creation of fully controllable 3D portraits in real-world capture conditions. CoDyNeRF learns to approximate illumination dependent effects via a dynamic appearance model in the canonical space that is conditioned on predicted surface normals and the facial expressions and head-pose deformations. The surface normals prediction is guided using 3DMM normals that act as a coarse prior for the normals of the human head, where direct prediction of normals is hard due to rigid and non-rigid deformations induced by head-pose and facial expression changes. Using only a smartphone-captured short video of a subject for training, we demonstrate the effectiveness of our method on free view synthesis of a portrait scene with explicit head pose and expression controls, and realistic lighting effects. The project page can be found here: http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html
翻译:神经辐射场(NeRFs)的最新进展使得重建和重推动态肖像场景成为可能,并能够控制头部姿态、面部表情和视角方向。然而,训练此类模型假设变形区域上具有光度一致性,即面部在随头部姿态和面部表情变形时必须均匀照明。即使在演播室环境中,这种跨视频帧的光度一致性也难以维持,因此生成的可重推神经肖像在重推过程中容易出现伪影。本文提出了CoDyNeRF系统,能够在真实世界拍摄条件下创建完全可控的三维肖像。CoDyNeRF通过规范空间中的动态外观模型学习近似光照依赖效应,该模型以预测的表面法向量、面部表情和头部姿态变形为条件。表面法向量的预测由3DMM法向量引导,这些法向量作为人类头部法向量的粗略先验,由于头部姿态和面部表情变化引起的刚性和非刚性变形,直接预测法向量十分困难。仅使用智能手机拍摄的简短视频作为训练数据,我们展示了该方法在具备显式头部姿态和表情控制以及逼真光照效果的肖像场景自由视角合成中的有效性。项目页面详见:http://shahrukhathar.github.io/2023/08/22/CoDyNeRF.html