Promptable segmentation models such as SAM have established a powerful paradigm, enabling strong generalization to unseen objects and domains with minimal user input, including points, bounding boxes, and text prompts. Among these, bounding boxes stand out as particularly effective, often outperforming points while significantly reducing annotation costs. However, current training and evaluation protocols typically rely on synthetic prompts generated through simple heuristics, offering limited insight into real-world robustness. In this paper, we investigate the robustness of promptable segmentation models to natural variations in bounding box prompts. First, we conduct a controlled user study and collect thousands of real bounding box annotations. Our analysis reveals substantial variability in segmentation quality across users for the same model and instance, indicating that SAM-like models are highly sensitive to natural prompt noise. Then, since exhaustive testing of all possible user inputs is computationally prohibitive, we reformulate robustness evaluation as a white-box optimization problem over the bounding box prompt space. We introduce BREPS, a method for generating adversarial bounding boxes that minimize or maximize segmentation error while adhering to naturalness constraints. Finally, we benchmark state-of-the-art models across 10 datasets, spanning everyday scenes to medical imaging. Code - https://github.com/emb-ai/BREPS.
翻译:诸如SAM之类的可提示分割模型已建立起强大的范式,能够通过极少的用户输入(包括点、边界框和文本提示)实现对未见对象和领域的强大泛化能力。其中,边界框因其特别高效而脱颖而出,其性能通常优于点提示,同时显著降低了标注成本。然而,当前的训练与评估协议通常依赖于通过简单启发式方法生成的合成提示,对现实世界中的鲁棒性洞察有限。本文研究了可提示分割模型对边界框提示自然变化的鲁棒性。首先,我们进行了一项受控用户研究,收集了数千个真实的边界框标注。我们的分析表明,对于同一模型和实例,不同用户之间的分割质量存在显著差异,这表明类SAM模型对自然提示噪声高度敏感。随后,由于穷举测试所有可能的用户输入在计算上不可行,我们将鲁棒性评估重新表述为边界框提示空间上的白盒优化问题。我们提出了BREPS方法,用于生成在遵循自然性约束的同时最小化或最大化分割误差的对抗性边界框。最后,我们在涵盖日常场景到医学影像的10个数据集上对最先进的模型进行了基准测试。代码 - https://github.com/emb-ai/BREPS。