Generative Adversarial Networks (GANs) can synthesize realistic images, with the learned latent space shown to encode rich semantic information with various interpretable directions. However, due to the unstructured nature of the learned latent space, it inherits the bias from the training data where specific groups of visual attributes that are not causally related tend to appear together, a phenomenon also known as spurious correlations, e.g., age and eyeglasses or women and lipsticks. Consequently, the learned distribution often lacks the proper modelling of the missing examples. The interpolation following editing directions for one attribute could result in entangled changes with other attributes. To address this problem, previous works typically adjust the learned directions to minimize the changes in other attributes, yet they still fail on strongly correlated features. In this work, we study the entanglement issue in both the training data and the learned latent space for the StyleGAN2-FFHQ model. We propose a novel framework SC$^2$GAN that achieves disentanglement by re-projecting low-density latent code samples in the original latent space and correcting the editing directions based on both the high-density and low-density regions. By leveraging the original meaningful directions and semantic region-specific layers, our framework interpolates the original latent codes to generate images with attribute combination that appears infrequently, then inverts these samples back to the original latent space. We apply our framework to pre-existing methods that learn meaningful latent directions and showcase its strong capability to disentangle the attributes with small amounts of low-density region samples added.
翻译:生成对抗网络(GANs)能够合成逼真的图像,其学习到的潜在空间被证明编码了丰富的语义信息,并包含多种可解释的方向。然而,由于学习到的潜在空间具有非结构化特性,它继承了训练数据中的偏差,即那些没有因果关系的特定视觉属性组往往同时出现,这种现象也被称为虚假关联(spurious correlations),例如年龄与眼镜、或女性与口红。因此,学习到的分布通常缺乏对缺失示例的适当建模。沿编辑方向对一个属性进行插值可能导致其他属性发生纠缠变化。为解决这一问题,先前的工作通常调整学习到的方向以最小化其他属性的变化,但在强关联特征上仍会失败。在本工作中,我们研究了StyleGAN2-FFHQ模型中训练数据与学习到的潜在空间中的纠缠问题,并提出一种新颖框架SC$^2$GAN,该框架通过将低密度潜在码样本重投影回原始潜在空间,并同时基于高密度和低密度区域校正编辑方向来实现解缠。通过利用原有的有意义方向和语义层特定区域,我们的框架对原始潜在码进行插值以生成具有罕见属性组合的图像,然后将这些样本反投影回原始潜在空间。我们将该框架应用于学习有意义潜在方向的现有方法,并展示了其在添加少量低密度区域样本时解缠属性的强大能力。