This paper introduces a diffusion-based framework for universal image segmentation, making agnostic segmentation possible without depending on mask-based frameworks and instead predicting the full segmentation in a holistic manner. We present several key adaptations to diffusion models, which are important in this discrete setting. Notably, we show that a location-aware palette with our 2D gray code ordering improves performance. Adding a final tanh activation function is crucial for discrete data. On optimizing diffusion parameters, the sigmoid loss weighting consistently outperforms alternatives, regardless of the prediction type used, and we settle on x-prediction. While our current model does not yet surpass leading mask-based architectures, it narrows the performance gap and introduces unique capabilities, such as principled ambiguity modeling, that these models lack. All models were trained from scratch, and we believe that combining our proposed improvements with large-scale pretraining or promptable conditioning could lead to competitive models.
翻译:本文提出了一种基于扩散的通用图像分割框架,实现了不依赖于掩码框架的不可知论分割,并以整体方式预测完整的分割结果。我们提出了对扩散模型的若干关键适配方案,这些方案在此离散设定中尤为重要。值得注意的是,我们证明了采用二维格雷码排序的位置感知调色板能够提升性能。添加最终的tanh激活函数对于离散数据至关重要。在优化扩散参数方面,无论使用何种预测类型,sigmoid损失加权方法始终优于其他方案,我们最终确定采用x-prediction。虽然当前模型尚未超越领先的基于掩码的架构,但它显著缩小了性能差距,并引入了这些模型所缺乏的独特能力,例如基于原理的模糊性建模。所有模型均从头开始训练,我们相信将所提出的改进方案与大规模预训练或可提示条件化相结合,有望开发出具有竞争力的模型。