General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts, including visual (points, boxed, etc.) and textual (object names) ones. In particular, input images are pre-processed by an image encoder to obtain embedding vectors which are later used for mask predictions. Existing adversarial attacks target the end-to-end tasks, i.e. aim at altering the segmentation mask predicted for a specific image-prompt pair. However, this requires running an individual attack for each new prompt for the same image. We propose instead to generate prompt-agnostic adversarial attacks by maximizing the $\ell_2$-distance, in the latent space, between the embedding of the original and perturbed images. Since the encoding process only depends on the image, distorted image representations will cause perturbations in the segmentation masks for a variety of prompts. We show that even imperceptible $\ell_\infty$-bounded perturbations of radius $\epsilon=1/255$ are often sufficient to drastically modify the masks predicted with point, box and text prompts by recently proposed foundation models for segmentation. Moreover, we explore the possibility of creating universal, i.e. non image-specific, attacks which can be readily applied to any input without further computational cost.
翻译:通用分割模型能够从多种提示(包括视觉提示(点、框等)和文本提示(对象名称))生成(语义)分割掩码。具体而言,输入图像通过图像编码器进行预处理以获得嵌入向量,这些向量随后用于掩码预测。现有的对抗攻击针对端到端任务,即旨在改变特定图像-提示对所预测的分割掩码。然而,这需要对同一图像新的提示执行单独的攻击。我们提出通过最大化潜在空间中原始图像与扰动图像嵌入之间的 $\ell_2$ 距离来生成提示无关的对抗攻击。由于编码过程仅依赖于图像,扭曲的图像表示将导致多种提示下分割掩码的扰动。我们证明,即使是半径为 $\epsilon=1/255$ 的不可感知的 $\ell_\infty$ 有界扰动,也通常足以显著修改最近提出的分割基础模型在用点、框和文本提示预测的掩码。此外,我们探索了创建通用(即非图像特定)攻击的可能性,这种攻击可以随时应用于任何输入而无需额外计算成本。