Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.
翻译:近期,基于CLIP引导的图像合成技术在将预训练的源域生成器迁移至未见过的目标域方面展现出令人瞩目的性能。该方法无需目标域样本,仅需文本形式的域标签,且训练效率极高(例如数分钟)。然而,现有方法在生成图像质量上仍存在局限,并可能面临模式坍缩问题。其核心原因在于,所有跨域图像对共享固定的自适应方向,导致监督信号同质化。为解决该问题,我们提出图像特定提示学习(Image-specific Prompt Learning, IPL)方法,即为每个源域图像学习专属的提示向量。这为每个跨域图像对生成更精确的自适应方向,显著增强目标域生成器的灵活性。在多个域上的定性与定量评估表明,IPL有效提升了合成图像的质量与多样性,并缓解了模式坍缩现象。此外,IPL与生成模型的具体架构(如生成对抗网络或扩散模型)无关。代码开源地址:https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation。