As Segment Anything Model (SAM) becomes a popular foundation model in computer vision, its adversarial robustness has become a concern that cannot be ignored. This works investigates whether it is possible to attack SAM with image-agnostic Universal Adversarial Perturbation (UAP). In other words, we seek a single perturbation that can fool the SAM to predict invalid masks for most (if not all) images. We demonstrate convetional image-centric attack framework is effective for image-independent attacks but fails for universal adversarial attack. To this end, we propose a novel perturbation-centric framework that results in a UAP generation method based on self-supervised contrastive learning (CL), where the UAP is set to the anchor sample and the positive sample is augmented from the UAP. The representations of negative samples are obtained from the image encoder in advance and saved in a memory bank. The effectiveness of our proposed CL-based UAP generation method is validated by both quantitative and qualitative results. On top of the ablation study to understand various components in our proposed method, we shed light on the roles of positive and negative samples in making the generated UAP effective for attacking SAM.
翻译:随着分段一切模型(SAM)成为计算机视觉领域的流行基础模型,其对抗鲁棒性已成为不可忽视的问题。本研究探讨是否可能使用与图像无关的通用对抗扰动(UAP)来攻击SAM。换言之,我们寻求一种单一扰动,能够欺骗SAM对大多数(若非全部)图像预测出无效掩码。我们证明,传统的以图像为中心的攻击框架适用于独立图像攻击,但在通用对抗攻击中失效。为此,我们提出一种新颖的以扰动为中心的框架,该框架基于自监督对比学习(CL)生成UAP,其中UAP被设为锚样本,正样本由UAP增强获得。负样本的表示预先从图像编码器中提取并存储在记忆库中。通过定性与定量结果验证了我们提出的基于CL的UAP生成方法的有效性。在通过消融研究理解所提方法各组件的贡献后,我们阐明了正负样本在使生成的UAP有效攻击SAM中所起的作用。