The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts which, however, often require good skills to specify. To make SAM robust to casual prompts, this paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities, notably imprecise bounding boxes and insufficient points. Our key finding reveals that given such low-quality prompts, SAM's mask decoder tends to activate image features that are biased towards the background or confined to specific object parts. To mitigate this issue, our key idea consists of adjusting the sampling locations of image feature using learnable deformable offsets, while the original SAM model architecture and weights remain unchanged. Consequently, our deformable sampling plugin (DSP) enables SAM to adaptively shift attention to the prompted target regions in a data-driven manner, facilitated by our effective robust training strategy (RTS). During inference, dynamic routing plugin (DRP) is proposed that toggles SAM between the deformable and regular grid sampling modes, conditioned on the input prompt quality. Thus, our solution, termed Stable-SAM, is one of its kind focusing on solely adjusting feature sampling locations, which offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality, with 3) minimal learnable parameters (0.08 M) and fast adaptation (by 1 training epoch). Extensive experiments across multiple datasets validate the effectiveness and advantages of our approach, underscoring Stable-SAM as a more robust solution for segmenting anything. Codes will be released upon acceptance.
翻译:分割一切模型(SAM)在高质量提示条件下展现了出色的可提示分割能力,然而高质量提示往往需要良好的操作技巧。为使SAM能够鲁棒地应对粗略提示,本文首次系统分析了SAM在多种提示质量(尤其是非精确边界框和不足点标注)下的分割稳定性。关键发现表明,在低质量提示下,SAM的掩码解码器倾向于激活偏向背景或局限在特定物体部分的图像特征。为解决这一问题,我们的核心思路是通过可学习变形偏移调整图像特征的采样位置,同时保持原始SAM模型架构和权重不变。由此,我们提出的可变形采样插件(DSP)能够使SAM以数据驱动方式自适应地将注意力转向提示目标区域,这得益于我们有效的鲁棒训练策略(RTS)。在推理阶段,我们提出动态路由插件(DRP),根据输入提示质量在可变形采样与规则网格采样模式之间切换。因此,我们的解决方案Stable-SAM是首个仅调整特征采样位置的同类工作,具有以下优势:1) 显著提升SAM在宽范围提示质量下的分割稳定性;2) 保留SAM强大的可提示分割效率和泛化性;3) 仅需极少量可学习参数(0.08M)和快速适应(1个训练周期)。在多个数据集上的大量实验验证了本方法的有效性和优势,凸显Stable-SAM作为更鲁棒的分割一切解决方案的能力。代码将在接收后开源。