Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

The Segment Anything Model (SAM) represents a significant breakthrough into foundation models for computer vision, providing a large-scale image segmentation model. However, despite SAM's zero-shot performance, its segmentation masks lack fine-grained details, particularly in accurately delineating object boundaries. Therefore, it is both interesting and valuable to explore whether SAM can be improved towards highly accurate object segmentation, which is known as the dichotomous image segmentation (DIS) task. To address this issue, we propose DIS-SAM, which advances SAM towards DIS with extremely accurate details. DIS-SAM is a framework specifically tailored for highly accurate segmentation, maintaining SAM's promptable design. DIS-SAM employs a two-stage approach, integrating SAM with a modified advanced network that was previously designed to handle the prompt-free DIS task. To better train DIS-SAM, we employ a ground truth enrichment strategy by modifying original mask annotations. Despite its simplicity, DIS-SAM significantly advances the SAM, HQ-SAM, and Pi-SAM ~by 8.5%, ~6.9%, and ~3.7% maximum F-measure. Our code at https://github.com/Tennine2077/DIS-SAM

翻译：Segment Anything Model（SAM）代表了计算机视觉基础模型领域的重大突破，提供了一个大规模图像分割模型。然而，尽管SAM具备零样本性能，其分割掩码缺乏细粒度细节，尤其在精确勾勒物体边界方面存在不足。因此，探索能否改进SAM以实现高精度物体分割（即二分图像分割任务）既具研究意义又富有价值。为解决此问题，我们提出DIS-SAM框架，推动SAM实现具备极致细节精度的二分图像分割。DIS-SAM是专为高精度分割定制的框架，保留了SAM可提示的设计特性。该框架采用两阶段方法，将SAM与改进的先进网络相结合，后者原为处理无提示二分图像分割任务而设计。为优化DIS-SAM的训练，我们通过修改原始掩码标注实施了真值增强策略。尽管设计简洁，DIS-SAM在最大F度量值上显著超越了SAM、HQ-SAM和Pi-SAM模型，提升幅度分别达到约8.5%、6.9%和3.7%。代码已开源：https://github.com/Tennine2077/DIS-SAM

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日