Generative Adversarial Networks (GANs) have significantly advanced image synthesis through mapping randomly sampled latent codes to high-fidelity synthesized images. However, applying well-trained GANs to real image editing remains challenging. A common solution is to find an approximate latent code that can adequately recover the input image to edit, which is also known as GAN inversion. To invert a GAN model, prior works typically focus on reconstructing the target image at the pixel level, yet few studies are conducted on whether the inverted result can well support manipulation at the semantic level. This work fills in this gap by proposing in-domain GAN inversion, which consists of a domain-guided encoder and a domain-regularized optimizer, to regularize the inverted code in the native latent space of the pre-trained GAN model. In this way, we manage to sufficiently reuse the knowledge learned by GANs for image reconstruction, facilitating a wide range of editing applications without any retraining. We further make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property. Such a trade-off sheds light on how a GAN model represents an image with various semantics encoded in the learned latent distribution. Code, models, and demo are available at the project page: https://genforce.github.io/idinvert/.
翻译:生成对抗网络(GAN)通过将随机采样的潜在编码映射到高保真合成图像,显著推进了图像合成技术。然而,将预训练GAN应用于真实图像编辑仍具挑战性。常见解决方案是寻找能充分恢复待编辑输入图像的近似潜在编码,这被称为GAN逆映射。以往研究主要关注在像素级别重建目标图像,但鲜有工作探讨逆映射结果是否能在语义层面有效支持操控。本文提出领域内GAN逆映射填补这一空白,该方法包含领域引导编码器与领域正则化优化器,将逆映射编码约束在预训练GAN模型的原始潜在空间中。通过这种方式,我们充分复用了GAN在图像重建中习得的知识,无需任何重新训练即可支持广泛的编辑应用。进一步,我们系统分析了编码器结构、逆映射起始点及逆映射参数空间的影响,观察到重建质量与编辑属性之间存在权衡。这种权衡揭示了GAN模型如何通过习得的潜在分布中编码的多样化语义来表征图像。代码、模型及演示参见项目主页:https://genforce.github.io/idinvert/。