The StyleGAN family succeed in high-fidelity image generation and allow for flexible and plausible editing of generated images by manipulating the semantic-rich latent style space.However, projecting a real image into its latent space encounters an inherent trade-off between inversion quality and editability. Existing encoder-based or optimization-based StyleGAN inversion methods attempt to mitigate the trade-off but see limited performance. To fundamentally resolve this problem, we propose a novel two-phase framework by designating two separate networks to tackle editing and reconstruction respectively, instead of balancing the two. Specifically, in Phase I, a W-space-oriented StyleGAN inversion network is trained and used to perform image inversion and editing, which assures the editability but sacrifices reconstruction quality. In Phase II, a carefully designed rectifying network is utilized to rectify the inversion errors and perform ideal reconstruction. Experimental results show that our approach yields near-perfect reconstructions without sacrificing the editability, thus allowing accurate manipulation of real images. Further, we evaluate the performance of our rectifying network, and see great generalizability towards unseen manipulation types and out-of-domain images.
翻译:StyleGAN系列在高保真图像生成方面取得了成功,并通过操纵语义丰富的潜在风格空间,实现了对生成图像的灵活且可信的编辑。然而,将真实图像投影至其潜在空间,会在反演质量与可编辑性之间面临固有的权衡。现有基于编码器或基于优化的StyleGAN反演方法试图缓解这一权衡,但性能有限。为从根本上解决该问题,我们提出了一种新颖的两阶段框架,通过分别指定两个独立网络分别处理编辑与重建任务,而非在两者之间寻求平衡。具体而言,在第一阶段,训练一个面向W空间的StyleGAN反演网络,用于执行图像反演与编辑,这保证了可编辑性但牺牲了重建质量。在第二阶段,利用一个精心设计的纠正网络来修正反演误差,并实现理想的重建。实验结果表明,我们的方法在不牺牲可编辑性的前提下实现了近乎完美的重建,从而能够精确操控真实图像。此外,我们评估了纠正网络的性能,发现其对未见过的操控类型及域外图像具有良好的泛化能力。