The advent of open-source AI communities has produced a cornucopia of powerful text-guided diffusion models that are trained on various datasets. While few explorations have been conducted on ensembling such models to combine their strengths. In this work, we propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation. Specifically, we experimentally find that the responses of classifier-free guidance are highly related to the saliency of generated images. Thus we propose to trust different models in their areas of expertise by blending the predicted noises of two diffusion models in a saliency-aware manner. SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks. Extensive experiments show the impressive effectiveness of SNB in various applications. Project page is available at https://magicfusion.github.io/.
翻译:开源AI社区的发展催生了大量强大的文本引导扩散模型,这些模型基于不同数据集进行训练。然而,目前鲜有研究探索如何集成这类模型以融合其优势。本文提出一种简单而有效的方法——显著性感知噪声混合(SNB),该方法能够增强融合后文本引导扩散模型的可控生成能力。具体而言,我们通过实验发现无分类器引导的响应与生成图像的显著性高度相关。因此,我们提出通过以显著性感知方式混合两个扩散模型的预测噪声,使不同模型在其擅长领域发挥信任。SNB无需训练,可在DDIM采样过程中完成。此外,它能够自动对齐两个噪声空间的语义,无需额外注释(如掩码)。大量实验证明了SNB在各种应用中的显著有效性。项目页面访问地址为 https://magicfusion.github.io/。