StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing. However, in order to manipulate a real-world image, one first needs to be able to retrieve its corresponding latent representation in StyleGAN's latent space that is decoded to an image as close as possible to the desired image. For many real-world images, a latent representation does not exist, which necessitates the tuning of the generator network. We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights, resulting in almost perfect inversion, while still allowing image editing, by keeping the rest of the mapping between an input latent representation tensor and an output image relatively intact. The method is based on a one-shot training of a set of shallow update networks (aka. Gradient Modification Modules) that modify the layers of the generator. After training the Gradient Modification Modules, a modified generator is obtained by a single application of these networks to the original parameters, and the previous editing capabilities of the generator are maintained. Our experiments show a sizable gap in performance over the current state of the art in this very active domain. Our code is available at \url{https://github.com/sheffier/gani}.
翻译:StyleGAN2已被证明是一种支持语义编辑的强大图像生成引擎。然而,要操作真实世界图像,首先需要在其潜在空间中检索到对应的潜在表示,该表示解码生成的图像需尽可能接近目标图像。对于许多真实世界图像而言,并不存在对应的潜在表示,这就需要对生成器网络进行微调。我们提出了一种逐图像优化方法,通过局部调整StyleGAN2生成器的权重实现近乎完美的图像反演,同时保持输入潜在表示张量与输出图像之间其他映射关系的相对完整性,从而保留图像编辑能力。该方法基于一次性训练一组浅层更新网络(即梯度修正模块)来修改生成器的各层参数。完成梯度修正模块训练后,只需将这些网络单次应用于原始参数即可获得修正后的生成器,且生成器原有的编辑能力得以保持。实验结果表明,在这一活跃的研究领域中,我们的方法相较于当前最优技术具有显著性能优势。相关代码已开源至\url{https://github.com/sheffier/gani}。