3D-aware GANs offer new capabilities for creative content editing, such as view synthesis, while preserving the editing capability of their 2D counterparts. Using GAN inversion, these methods can reconstruct an image or a video by optimizing/predicting a latent code and achieve semantic editing by manipulating the latent code. However, a model pre-trained on a face dataset (e.g., FFHQ) often has difficulty handling faces with out-of-distribution (OOD) objects, (e.g., heavy make-up or occlusions). We address this issue by explicitly modeling OOD objects in face videos. Our core idea is to represent the face in a video using two neural radiance fields, one for in-distribution and the other for out-of-distribution data, and compose them together for reconstruction. Such explicit decomposition alleviates the inherent trade-off between reconstruction fidelity and editability. We evaluate our method's reconstruction accuracy and editability on challenging real videos and showcase favorable results against other baselines.
翻译:三维感知生成对抗网络(3D-aware GANs)在保留二维生成对抗网络编辑能力的同时,为创意内容编辑(如视角合成)提供了新能力。通过生成对抗网络反演,这些方法可优化/预测潜在编码以重建图像或视频,并通过操控潜在编码实现语义编辑。然而,预训练于人脸数据集(如FFHQ)的模型常难以处理包含分布外物体(如浓妆或遮挡)的人脸。我们通过显式建模人脸视频中的分布外物体来解决此问题。核心思想是使用双神经辐射场表示视频中人脸——一个用于分布内数据,另一个用于分布外数据——并通过组合两者实现重建。这种显式分解缓解了重建保真度与可编辑性之间的固有权衡。我们在具有挑战性的真实视频上评估了方法的重建精度与可编辑性,展示了相较于其他基准方法的优越性能。