In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in https://asam2024.github.io/.
翻译:在计算机视觉不断发展的背景下,基础模型已成为关键工具,展现出对多种任务的卓越适应性。其中,Meta AI的分割一切模型(SAM)在图像分割领域脱颖而出。然而,SAM与其他同类模型类似,在特定细分应用中存在局限性,这促使我们寻求在不损害其固有能力的增强策略。本文介绍ASAM,一种通过对抗性调优提升SAM性能的新方法。我们利用自然对抗样本的潜力,这一灵感源于其在自然语言处理中的成功应用。通过采用稳定扩散模型,我们对SA-1B数据集的子集(1%)进行增强,生成更能代表自然变化而非传统不可感知扰动的对抗实例。该方法保持了对抗样本的照片级真实感,并确保与原始掩码标注对齐,从而维护了分割任务的完整性。经过微调的ASAM在广泛的分割任务中展现出显著改进,无需额外数据或架构调整。广泛评估结果证实,ASAM在分割任务中建立了新的基准,从而推动了计算机视觉基础模型的发展。我们的项目页面位于https://asam2024.github.io/。