Denoising diffusion probabilistic models have recently demonstrated state-of-the-art generative performance and been used as strong pixel-level representation learners. This paper decomposes the interrelation between the generative capability and representation learning ability inherent in diffusion models. We present masked diffusion model (MDM), a scalable self-supervised representation learner that substitutes the conventional additive Gaussian noise of traditional diffusion with a masking mechanism. Our proposed approach convincingly surpasses prior benchmarks, demonstrating remarkable advancements in both medical and natural image semantic segmentation tasks, particularly within the context of few-shot scenario.
翻译:去噪扩散概率模型近期展现了最先进的生成性能,并被用作强大的像素级表示学习器。本文分解了扩散模型中生成能力与表示学习能力之间的内在关联。我们提出遮蔽扩散模型(MDM),这是一种可扩展的自监督表示学习器,它用遮蔽机制替代了传统扩散中的加性高斯噪声。我们提出的方法令人信服地超越了先前的基准,在医学和自然图像语义分割任务中展现出显著进步,特别是在少样本场景下。