The success of the GAN-NeRF structure has enabled face editing on NeRF to maintain 3D view consistency. However, achieving simultaneously multi-view consistency and temporal coherence while editing video sequences remains a formidable challenge. This paper proposes a novel face video editing architecture built upon the dynamic face GAN-NeRF structure, which effectively utilizes video sequences to restore the latent code and 3D face geometry. By editing the latent code, multi-view consistent editing on the face can be ensured, as validated by multiview stereo reconstruction on the resulting edited images in our dynamic NeRF. As the estimation of face geometries occurs on a frame-by-frame basis, this may introduce a jittering issue. We propose a stabilizer that maintains temporal coherence by preserving smooth changes of face expressions in consecutive frames. Quantitative and qualitative analyses reveal that our method, as the pioneering 4D face video editor, achieves state-of-the-art performance in comparison to existing 2D or 3D-based approaches independently addressing identity and motion. Codes will be released.
翻译:GAN-NeRF结构的成功使得基于NeRF的人脸编辑能够保持3D视角一致性。然而,在编辑视频序列时同时实现多视角一致性与时间连贯性仍是一项严峻挑战。本文提出一种基于动态人脸GAN-NeRF结构的新型人脸视频编辑架构,该架构能有效利用视频序列恢复隐编码与3D人脸几何。通过编辑隐编码,可确保人脸编辑的多视角一致性——这一结论通过动态NeRF对编辑后图像进行的多视图立体重建得到验证。由于人脸几何估计基于逐帧进行,可能引发抖动问题。我们提出一种稳定器,通过保持连续帧中面部表情的平滑变化来维持时间连贯性。定量与定性分析表明,作为开创性4D人脸视频编辑器,我们的方法在独立处理身份与运动方面,相较于现有基于2D或3D的方法均取得了最先进性能。代码将开源。