Segment anything model (SAM) has presented impressive objectness identification capability with the idea of prompt learning and a new collected large-scale dataset. Given a prompt (e.g., points, bounding boxes, or masks) and an input image, SAM is able to generate valid segment masks for all objects indicated by the prompts, presenting high generalization across diverse scenarios and being a general method for zero-shot transfer to downstream vision tasks. Nevertheless, it remains unclear whether SAM may introduce errors in certain threatening scenarios. Clarifying this is of significant importance for applications that require robustness, such as autonomous vehicles. In this paper, we aim to study the testing-time robustness of SAM under adversarial scenarios and common corruptions. To this end, we first build a testing-time robustness evaluation benchmark for SAM by integrating existing public datasets. Second, we extend representative adversarial attacks against SAM and study the influence of different prompts on robustness. Third, we study the robustness of SAM under diverse corruption types by evaluating SAM on corrupted datasets with different prompts. With experiments conducted on SA-1B and KITTI datasets, we find that SAM exhibits remarkable robustness against various corruptions, except for blur-related corruption. Furthermore, SAM remains susceptible to adversarial attacks, particularly when subjected to PGD and BIM attacks. We think such a comprehensive study could highlight the importance of the robustness issues of SAM and trigger a series of new tasks for SAM as well as downstream vision tasks.
翻译:分割一切模型(SAM)凭借提示学习思想和新构建的大规模数据集,展现了显著的目标辨识能力。给定提示(如点、边界框或掩码)和输入图像,SAM能够为提示指向的所有目标生成有效分割掩码,在多种场景下呈现高泛化性,成为下游视觉任务零样本迁移的通用方法。然而,在特定威胁场景下SAM是否可能引入误差仍不明确。厘清这一问题对于需要鲁棒性的应用(如自动驾驶)至关重要。本文旨在研究SAM在对抗性场景和常见损坏下的测试时鲁棒性。为此,我们首先集成现有公开数据集构建了SAM的测试时鲁棒性评估基准;其次,扩展了针对SAM的代表性对抗攻击,并探究不同提示对鲁棒性的影响;第三,通过在不同提示下对损坏数据集进行评测,研究SAM在多种损坏类型下的鲁棒性。基于SA-1B和KITTI数据集的实验发现:除模糊类损坏外,SAM对各类损坏均表现出显著鲁棒性;但SAM仍易受对抗攻击影响,尤其在PGD和BIM攻击下。我们认为此项系统研究可凸显SAM鲁棒性问题的重要性,并触发一系列针对SAM及其下游视觉任务的新研究方向。