The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality. To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.
翻译:近期提出的视觉基础模型(VFM)——Segment Anything Model(SAM)——在多种自然图像数据集上的零样本分割任务中展现了令人印象深刻的能力。尽管取得了成功,SAM在应用于特定领域(如医学图像)时仍表现出明显的性能下降。当前解决这一问题的尝试多涉及微调策略,旨在增强原始SAM的泛化能力。然而,这些方法在评估阶段仍主要依赖于领域特定的专家级提示,这严重限制了模型的实际应用性。为克服这一局限,我们提出了一种新颖的基于自提示的微调方法,称为SAM-SP,专为扩展原始SAM模型而设计。具体而言,SAM-SP利用模型自身前一轮迭代的输出作为提示,以指导后续迭代过程。该自提示模块致力于学习如何自主生成有效提示,从而减轻评估阶段对专家提示的依赖,显著拓宽了SAM的适用性。此外,我们引入了一个自蒸馏模块以进一步增强自提示过程。在多个领域特定数据集上的广泛实验验证了所提SAM-SP的有效性。我们的SAM-SP不仅降低了对专家提示的依赖,而且在分割性能上优于当前最先进的特定任务分割方法、原始SAM以及基于SAM的各类方法。