Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
翻译:扩散模型在文本引导的图像风格迁移中展现出巨大潜力,但其随机特性导致风格转换与内容保留之间存在权衡。现有方法需要对扩散模型进行计算昂贵的微调或额外添加神经网络。为解决这一问题,本文提出一种无需额外微调或辅助网络的零样本对比损失方法。通过利用预训练扩散模型中生成样本与原始图像嵌入之间的块级对比损失,我们的方法能以零样本方式生成与源图像具有相同语义内容的图像。本方法不仅适用于图像风格迁移,在图像到图像翻译及操控任务中也优于现有方法,同时保留了内容且无需额外训练。实验结果验证了所提方法的有效性。