Recently, the first foundation model developed specifically for vision tasks was developed, termed the "Segment Anything Model" (SAM). SAM can segment objects in input imagery based upon cheap input prompts, such as one (or more) points, a bounding box, or a mask. The authors examined the zero-shot image segmentation accuracy of SAM on a large number of vision benchmark tasks and found that SAM usually achieved recognition accuracy similar to, or sometimes exceeding, vision models that had been trained on the target tasks. The impressive generalization of SAM for segmentation has major implications for vision researchers working on natural imagery. In this work, we examine whether SAM's impressive performance extends to overhead imagery problems, and help guide the community's response to its development. We examine SAM's performance on a set of diverse and widely-studied benchmark tasks. We find that SAM does often generalize well to overhead imagery, although it fails in some cases due to the unique characteristics of overhead imagery and the target objects. We report on these unique systematic failure cases for remote sensing imagery that may comprise useful future research for the community. Note that this is a working paper, and it will be updated as additional analysis and results are completed.
翻译:近期,专为视觉任务开发的首个基础模型问世,称为"分割一切模型"(SAM)。该模型可根据廉价输入提示(如一个或多个点、边界框或掩码)对输入图像中的物体进行分割。作者在大量视觉基准任务上测试了SAM的零样本图像分割准确率,发现其识别精度通常与经过目标任务训练的视觉模型相当,有时甚至超越后者。SAM在分割任务中展现出的惊人泛化能力,对从事自然图像研究的视觉领域学者具有重大意义。本研究旨在验证SAM的卓越性能是否适用于天基遥感图像问题,并引导学界对其发展做出响应。我们在一组多样化且广泛研究的基准任务上评估了SAM的表现。结果表明,SAM通常能良好泛化至遥感图像,但由于遥感图像及目标物体的独特性,在某些情形下仍存在失败案例。我们报告了这些遥感影像特有的系统性失败案例,这些发现可能为学界未来研究提供参考。需注意,本文为工作论文,后续将随补充分析与成果进行更新。