Semantic segmentation has made significant progress in recent years thanks to deep neural networks, but the common objective of generating a single segmentation output that accurately matches the image's content may not be suitable for safety-critical domains such as medical diagnostics and autonomous driving. Instead, multiple possible correct segmentation maps may be required to reflect the true distribution of annotation maps. In this context, stochastic semantic segmentation methods must learn to predict conditional distributions of labels given the image, but this is challenging due to the typically multimodal distributions, high-dimensional output spaces, and limited annotation data. To address these challenges, we propose a conditional categorical diffusion model (CCDM) for semantic segmentation based on Denoising Diffusion Probabilistic Models. Our model is conditioned to the input image, enabling it to generate multiple segmentation label maps that account for the aleatoric uncertainty arising from divergent ground truth annotations. Our experimental results show that CCDM achieves state-of-the-art performance on LIDC, a stochastic semantic segmentation dataset, and outperforms established baselines on the classical segmentation dataset Cityscapes.
翻译:语义分割近年来得益于深度神经网络取得了显著进展,但生成与图像内容精确匹配的单一分割输出的常见目标可能不适用于医学诊断和自动驾驶等安全关键领域。相反,可能需要多个可能的正确分割图来反映标注图的真实分布。在此背景下,随机语义分割方法必须学习预测给定图像下标签的条件分布,但由于通常的多模态分布、高维输出空间以及有限的标注数据,这一任务颇具挑战性。为解决这些挑战,我们基于去噪扩散概率模型提出了一种用于语义分割的条件类别扩散模型(CCDM)。我们的模型以输入图像为条件,使其能够生成多个分割标签图,以考虑由不一致的真实标注引起的偶然不确定性。实验结果表明,CCDM在随机语义分割数据集LIDC上达到了最先进性能,并在经典分割数据集Cityscapes上优于已有基线方法。