The emergence of large models, also known as foundation models, has brought significant advancements to AI research. One such model is Segment Anything (SAM), which is designed for image segmentation tasks. However, as with other foundation models, our experimental findings suggest that SAM may fail or perform poorly in certain segmentation tasks, such as shadow detection and camouflaged object detection (concealed object detection). This study first paves the way for applying the large pre-trained image segmentation model SAM to these downstream tasks, even in situations where SAM performs poorly. Rather than fine-tuning the SAM network, we propose \textbf{SAM-Adapter}, which incorporates domain-specific information or visual prompts into the segmentation network by using simple yet effective adapters. Our extensive experiments show that SAM-Adapter can significantly elevate the performance of SAM in challenging tasks and we can even outperform task-specific network models and achieve state-of-the-art performance in the task we tested: camouflaged object detection and shadow detection. We believe our work opens up opportunities for utilizing SAM in downstream tasks, with potential applications in various fields, including medical image processing, agriculture, remote sensing, and more.
翻译:大型模型(亦称基础模型)的出现为人工智能研究带来了重大进展。其中,Segment Anything(SAM)模型专为图像分割任务而设计。然而,与其他基础模型类似,我们的实验发现表明,SAM在某些分割任务(如阴影检测和伪装物体检测)中可能失败或表现不佳。本研究首先为将大型预训练图像分割模型SAM应用于这些下游任务铺平了道路,即便在SAM表现不佳的场景中亦是如此。相较于微调SAM网络,我们提出了**SAM-Adapter**,通过使用简单而高效的适配器,将领域特定信息或视觉提示融入分割网络。大量实验表明,SAM-Adapter能显著提升SAM在具有挑战性任务中的性能,甚至能超越专用网络模型,在我们测试的任务(伪装物体检测和阴影检测)中达到最先进水平。我们相信,本工作为将SAM应用于下游任务开辟了机遇,并在医学图像处理、农业、遥感等众多领域具有潜在应用价值。