The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts are available at: https://github.com/aleemsidra/SaLIP.
翻译:[translated abstract in Chinese]
分割一切模型(SAM)与CLIP是卓越的视觉基础模型(VFMs)。SAM作为提示驱动的分割模型,在跨领域分割任务中表现优异,而CLIP则以其零样本识别能力著称。然而,它们在医学图像分割中的协同潜力尚未被充分探索。为将SAM适配至医学影像领域,现有方法主要依赖需要大量数据或针对特定任务定制先验提示的微调策略,这使得在仅有少量数据样本可用时极具挑战性。本文深入探索了将SAM与CLIP集成至统一框架用于医学图像分割的可行性。具体而言,我们提出了一种简洁的统一框架SaLIP,用于器官分割。首先,利用SAM对图像进行基于局部的分割,随后通过CLIP从SAM生成的掩码池中检索对应于感兴趣区域(ROI)的掩码。最后,SAM以检索到的ROI为提示,对特定器官进行分割。因此,SaLIP无需训练与微调,也不依赖领域专业知识或标注数据进行提示工程。我们的方法在零样本分割中展现出显著提升:与无提示SAM相比,在脑部(63.46%)、肺部(50.11%)及胎儿头部(30.82%)等不同分割任务中,DICE得分均获可观提升。代码与文本提示详见:https://github.com/aleemsidra/SaLIP。