The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. Therefore, it is both interesting and valuable to explore whether SAM can be improved towards highly accurate object segmentation, which is known as the dichotomous image segmentation (DIS) task. To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified advanced network that was previously designed to handle the prompt-free DIS task. To better train DIS-SAM, we employ a ground truth enrichment strategy by modifying original mask annotations. Despite its simplicity, DIS-SAM significantly advances the SAM, HQ-SAM, and Pi-SAM ~by 8.5%, ~6.9%, and ~3.7% maximum F-measure. Our code at https://github.com/Tennine2077/DIS-SAM
翻译:Segment Anything Model(SAM)代表了计算机视觉基础模型领域的重大突破,提供了一个大规模图像分割模型。然而,尽管SAM具备零样本性能,其分割掩码缺乏细粒度细节,尤其在精确勾勒物体边界方面存在不足。因此,探索能否改进SAM以实现高精度物体分割(即二分图像分割任务)既具研究意义又富有价值。为解决此问题,我们提出DIS-SAM框架,推动SAM实现具备极致细节精度的二分图像分割。DIS-SAM是专为高精度分割定制的框架,保留了SAM可提示的设计特性。该框架采用两阶段方法,将SAM与改进的先进网络相结合,后者原为处理无提示二分图像分割任务而设计。为优化DIS-SAM的训练,我们通过修改原始掩码标注实施了真值增强策略。尽管设计简洁,DIS-SAM在最大F度量值上显著超越了SAM、HQ-SAM和Pi-SAM模型,提升幅度分别达到约8.5%、6.9%和3.7%。代码已开源:https://github.com/Tennine2077/DIS-SAM