Training deep generative models usually requires a large amount of data. To alleviate the data collection cost, the task of zero-shot GAN adaptation aims to reuse well-trained generators to synthesize images of an unseen target domain without any further training samples. Due to the data absence, the textual description of the target domain and the vision-language models, e.g., CLIP, are utilized to effectively guide the generator. However, with only a single representative text feature instead of real images, the synthesized images gradually lose diversity as the model is optimized, which is also known as mode collapse. To tackle the problem, we propose a novel method to find semantic variations of the target text in the CLIP space. Specifically, we explore diverse semantic variations based on the informative text feature of the target domain while regularizing the uncontrolled deviation of the semantic information. With the obtained variations, we design a novel directional moment loss that matches the first and second moments of image and text direction distributions. Moreover, we introduce elastic weight consolidation and a relation consistency loss to effectively preserve valuable content information from the source domain, e.g., appearances. Through extensive experiments, we demonstrate the efficacy of the proposed methods in ensuring sample diversity in various scenarios of zero-shot GAN adaptation. We also conduct ablation studies to validate the effect of each proposed component. Notably, our model achieves a new state-of-the-art on zero-shot GAN adaptation in terms of both diversity and quality.
翻译:训练深度生成模型通常需要大量数据。为降低数据采集成本,零样本生成对抗网络自适应任务旨在复用预训练生成器,在无需额外训练样本的情况下合成目标域图像。由于数据缺失,研究者利用目标域的文本描述和视觉语言模型(如CLIP)指导生成器。然而,当仅采用单一代表性文本特征而非真实图像时,模型优化过程中合成图像会逐渐丧失多样性(即模式坍塌)。针对该问题,我们提出一种在CLIP空间中寻找目标文本语义变体的新方法。具体而言,基于目标域的信息性文本特征探索多样化语义变体,同时约束语义信息的非受控偏移。通过获得的变体,我们设计了一种新型方向矩损失函数,该函数匹配图像与文本方向分布的一阶矩和二阶矩。此外,引入弹性权重巩固与关系一致性损失,有效保留源域中的有价值内容信息(如外观)。大量实验表明,所提方法在零样本生成对抗网络自适应的多种场景中能够保障样本多样性。消融研究进一步验证了各模块的有效性。值得注意的是,我们的模型在零样本生成对抗自适应的多样性和质量指标上均达到了当前最优水平。