The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. We have high expectations regarding whether SAM, as a foundation model, can be improved towards highly accurate object segmentation, which is known as dichotomous image segmentation (DIS). To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified IS-Net dedicated to DIS. Despite its simplicity, DIS-SAM demonstrates significantly enhanced segmentation accuracy compared to SAM and HQ-SAM.
翻译:分段任意模型(SAM)是计算机视觉基础模型领域的重大突破,提供了大规模图像分割模型。然而,尽管SAM具有零样本性能,其分割掩模仍缺乏精细细节,特别是在精确描绘目标边界方面。我们对于SAM作为基础模型能否提升至实现高精度目标分割(即二分图像分割,DIS)寄予厚望。为解决这一问题,我们提出了DIS-SAM,它将SAM推进至具有极高细节精度的DIS领域。DIS-SAM是一个专为高精度分割量身定制的框架,保留了SAM的提示驱动设计。DIS-SAM采用两阶段方法,将SAM与专用于DIS的改进IS-Net相集成。尽管其结构简单,DIS-SAM相比SAM和HQ-SAM展现了显著增强的分割精度。