We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code optimization and then edited images in the latent space. Despite their promising results near the input view, they still suffer from the 3D inconsistency of produced images at large camera poses and imprecise image attribute editing, like affecting unspecified attributes during editing. For more efficient image inversion, we train a shared encoder for all images. To alleviate 3D inconsistency at large camera poses, we propose two novel methods, an alternating training scheme and a multi-view identity loss, to maintain 3D consistency and subject identity. As for imprecise image editing, we attribute the problem to the gap between the latent space of real images and that of generated images. We compare the latent space and inversion manifold of GAN models and demonstrate that editing in the inversion manifold can achieve better results in both quantitative and qualitative evaluations. Extensive experiments show that our method produces more 3D consistent images and achieves more precise image editing than previous work. Source code and pretrained models can be found on our project page: https://mybabyyh.github.io/Preim3D/
翻译:本文研究三维感知的图像属性编辑问题,该问题在实践中具有广泛的应用。近期方法通过训练共享编码器将图像映射到三维生成器的潜在空间,或通过逐张图像潜在码优化并在潜在空间中进行编辑来解决该问题。尽管这些方法在输入视角附近取得了令人鼓舞的结果,但在大相机位姿下生成的图像仍存在三维不一致性,且编辑过程中会因不精确的图像属性编辑(如影响未指定的属性)而受到局限。为实现更高效的图像反演,我们为所有图像训练了一个共享编码器。为了缓解大相机位姿下的三维不一致性,我们提出了两种新颖方法:交替训练方案和多视角身份损失,以维持三维一致性和主体身份。针对不精确的图像编辑问题,我们将其归因于真实图像潜在空间与生成图像潜在空间之间的差距。我们对比了GAN模型的潜在空间与反演流形,并证明在反演流形中进行编辑能够在定量和定性评估中取得更优结果。大量实验表明,与先前工作相比,我们的方法能生成三维一致性更强的图像并实现更精确的图像编辑。源代码与预训练模型可在项目主页获取:https://mybabyyh.github.io/Preim3D/