In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusion steps. Specifically, SegRefiner takes coarse masks as inputs and refines them using a discrete diffusion process. By predicting the label and corresponding states-transition probabilities for each pixel, SegRefiner progressively refines the noisy masks in a conditional denoising manner. To assess the effectiveness of SegRefiner, we conduct comprehensive experiments on various segmentation tasks, including semantic segmentation, instance segmentation, and dichotomous image segmentation. The results demonstrate the superiority of our SegRefiner from multiple aspects. Firstly, it consistently improves both the segmentation metrics and boundary metrics across different types of coarse masks. Secondly, it outperforms previous model-agnostic refinement methods by a significant margin. Lastly, it exhibits a strong capability to capture extremely fine details when refining high-resolution images. The source code and trained models are available at https://github.com/MengyuWang826/SegRefiner.
翻译:本文探索了一种提升不同分割模型生成的目标掩码质量的基本方法。我们提出了一种名为SegRefiner的模型无关解决方案,通过将分割精化问题解释为数据生成过程,为此问题提供了全新视角。由此,精化过程可通过一系列去噪扩散步骤平滑实现。具体而言,SegRefiner以粗糙掩码作为输入,并利用离散扩散过程对其进行精化。通过预测每个像素的标签及相应的状态转移概率,SegRefiner以条件去噪方式逐步精化含噪掩码。为评估SegRefiner的有效性,我们在语义分割、实例分割及二分图像分割等多种分割任务上开展了全面实验。实验结果从多个方面证实了SegRefiner的优越性:首先,针对不同类型的粗糙掩码,它能够持续提升分割度量与边界度量;其次,它显著优于先前各类模型无关精化方法;最后,在精化高分辨率图像时,其展现出捕捉极精细细节的强能力。源代码与训练模型已开源至https://github.com/MengyuWang826/SegRefiner。