The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts will be available online.
翻译:分割一切模型(SAM)和CLIP是卓越的视觉基础模型(VFM)。SAM作为提示驱动的分割模型,在跨域分割任务中表现出色,而CLIP则以零样本识别能力著称。然而,两者在医学图像分割中的协同潜力尚未得到充分探索。为将SAM适配到医学影像领域,现有方法主要依赖需要大量数据或针对特定任务定制的先验提示的调优策略,这使得在仅有少量数据样本可用时极具挑战性。本文深入探究了将SAM和CLIP整合到统一框架中用于医学图像分割的可行性。具体而言,我们提出了一个用于器官分割的简洁统一框架——SaLIP。首先,利用SAM对图像进行部件级分割,随后通过CLIP从SAM生成的掩码池中检索与感兴趣区域(ROI)对应的掩码。最后,以检索到的ROI作为提示引导SAM分割特定器官。因此,SaLIP无需训练或微调,不依赖领域专业知识或标注数据进行提示工程。与未使用提示的SAM相比,我们的方法在多种分割任务中展现出零样本分割的显著提升:脑部(63.46%)、肺部(50.11%)和胎儿头部(30.82%)的DICE分数均有实质性改善。代码和文本提示将在线发布。