Foundation models for image segmentation have shown strong generalization in natural images, yet their applicability to 3D medical imaging remains limited. In this work, we study the zero-shot use of Segment Anything Model 2 (SAM2) for automatic segmentation of volumetric CT data, without any fine-tuning or domain-specific training. We analyze how SAM2 should be applied to CT volumes and identify its main limitation: the lack of inherent volumetric awareness. To address this, we propose a set of inference-alone architectural and procedural modifications that adapt SAM2's video-based memory mechanism to 3D data by treating CT slices as ordered sequences. We conduct a systematic ablation study on a subset of 500 CT scans from the TotalSegmentator dataset to evaluate prompt strategies, memory propagation schemes and multi-pass refinement. Based on these findings, we select the best-performing configuration and report final results on a bigger sample of the TotalSegmentator dataset comprising 2,500 CT scans. Our results show that, even with frozen weights, SAM2 can produce coherent 3D segmentations when its inference pipeline is carefully structured, demonstrating the feasibility of a fully zero-shot approach for volumetric medical image segmentation.
翻译:图像分割的基础模型在自然图像中展现出强大的泛化能力,但其在三维医学影像中的适用性仍然有限。在这项工作中,我们研究了Segment Anything Model 2(SAM2)在零样本条件下对体积CT数据的自动分割,无需任何微调或领域特定训练。我们分析了如何将SAM2应用于CT体数据,并识别出其主要局限性:缺乏固有的体积感知能力。为此,我们提出了一套仅推理阶段的架构和流程改进,通过将CT切片视为有序序列,使SAM2基于视频的记忆机制适应三维数据。我们在TotalSegmentator数据集的500个CT扫描子集上进行了系统的消融研究,以评估提示策略、记忆传播方案和多轮细化方法。基于这些发现,我们选择了性能最佳的配置,并在包含2,500个CT扫描的TotalSegmentator数据集更大样本上报告了最终结果。我们的结果表明,即使权重被冻结,当推理流水线被精心构建时,SAM2也能够生成连贯的三维分割,这证明了全零样本方法在体积医学图像分割中的可行性。