Recent advances in denoising diffusion probabilistic models have shown great success in image synthesis tasks. While there are already works exploring the potential of this powerful tool in image semantic segmentation, its application in weakly supervised semantic segmentation (WSSS) remains relatively under-explored. Observing that conditional diffusion models (CDM) is capable of generating images subject to specific distributions, in this work, we utilize category-aware semantic information underlied in CDM to get the prediction mask of the target object with only image-level annotations. More specifically, we locate the desired class by approximating the derivative of the output of CDM w.r.t the input condition. Our method is different from previous diffusion model methods with guidance from an external classifier, which accumulates noises in the background during the reconstruction process. Our method outperforms state-of-the-art CAM and diffusion model methods on two public medical image segmentation datasets, which demonstrates that CDM is a promising tool in WSSS. Also, experiment shows our method is more time-efficient than existing diffusion model methods, making it practical for wider applications.
翻译:近期,去噪扩散概率模型在图像合成任务中取得了显著成功。尽管已有研究探索了这一强大工具在图像语义分割中的潜力,但其在弱监督语义分割(WSSS)中的运用仍相对不足。观察到条件扩散模型(CDM)能够生成服从特定分布的图像,本文利用CDM中蕴含的类别感知语义信息,仅通过图像级标注即可获得目标对象的预测掩膜。具体而言,我们通过近似CDM输出对输入条件的导数来定位所需类别。该方法不同于以往依赖外部分类器引导的扩散模型方法,后者在重建过程中会在背景中累积噪声。在两个公开医学图像分割数据集上,我们的方法优于当前最先进的CAM和扩散模型方法,证明了CDM在WSSS中是一种极具前景的工具。此外,实验表明我们的方法比现有扩散模型方法更高效,使其在更广泛的应用场景中具备实用性。