Pre-trained diffusion models have demonstrated remarkable proficiency in synthesizing images across a wide range of scenarios with customizable prompts, indicating their effective capacity to capture universal features. Motivated by this, our study delves into the utilization of the implicit knowledge embedded within diffusion models to address challenges in cross-domain semantic segmentation. This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently. We propose DIffusion Feature Fusion (DIFF) as a backbone use for extracting and integrating effective semantic representations through the diffusion process. By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it. Through rigorous evaluation in the contexts of domain generalization semantic segmentation, we establish that our methodology surpasses preceding approaches in mitigating discrepancies across distinct domains and attains the state-of-the-art (SOTA) benchmark.
翻译:预训练的扩散模型在可定制提示下展现了跨多种场景合成图像的卓越能力,表明其具备捕捉通用特征的有效潜力。受此启发,本研究深入探索利用扩散模型中嵌入的隐式知识来解决跨域语义分割中的挑战。本文研究了通过采样与融合技术高效利用扩散模型特征的方法。我们提出DIFFusion特征融合(DIFF)作为骨干网络,用于通过扩散过程提取并整合有效的语义表示。借助文本到图像生成能力的优势,我们引入了一种新的训练框架,旨在从中隐式学习后验知识。通过在领域泛化语义分割场景中的严格评估,我们证实所提方法在缓解不同领域间差异方面超越了先前方法,并达到了最先进的基准水平。