The recently proposed segment anything model (SAM) has made a significant influence in many computer vision tasks. It is becoming a foundation step for many high-level tasks, like image segmentation, image caption, and image editing. However, its huge computation costs prevent it from wider applications in industry scenarios. The computation mainly comes from the Transformer architecture at high-resolution inputs. In this paper, we propose a speed-up alternative method for this fundamental task with comparable performance. By reformulating the task as segments-generation and prompting, we find that a regular CNN detector with an instance segmentation branch can also accomplish this task well. Specifically, we convert this task to the well-studied instance segmentation task and directly train the existing instance segmentation method using only 1/50 of the SA-1B dataset published by SAM authors. With our method, we achieve a comparable performance with the SAM method at 50 times higher run-time speed. We give sufficient experimental results to demonstrate its effectiveness. The codes and demos will be released at https://github.com/CASIA-IVA-Lab/FastSAM.
翻译:近期提出的分割一切模型(SAM)在众多计算机视觉任务中产生了显著影响,正成为图像分割、图像描述和图像编辑等高级任务的基石。然而,其巨大的计算成本限制了其在工业场景中的更广泛应用,计算量主要源于高分辨率输入下的Transformer架构。本文针对这一基础任务提出了一种加速替代方法,在保持相当性能的同时提升速度。通过将任务重新表述为片段生成与提示任务,我们发现配备实例分割分支的常规CNN检测器也能很好地完成该任务。具体而言,我们将该任务转化为已得到充分研究的实例分割任务,并直接使用SAM作者发布的SA-1B数据集仅1/50的数据量训练现有实例分割方法。采用我们的方法,在运行时速度提升50倍的情况下,实现了与SAM方法相当的性能。我们提供了充分的实验结果来证明其有效性。代码和演示将在https://github.com/CASIA-IVA-Lab/FastSAM 发布。