Training segmentation models for medical images continues to be challenging due to the limited availability and acquisition expense of data annotations. Segment Anything Model (SAM) is a foundation model trained on over 1 billion annotations, predominantly for natural images, that is intended to be able to segment the user-defined object of interest in an interactive manner. Despite its impressive performance on natural images, it is unclear how the model is affected when shifting to medical image domains. Here, we perform an extensive evaluation of SAM's ability to segment medical images on a collection of 11 medical imaging datasets from various modalities and anatomies. In our experiments, we generated point prompts using a standard method that simulates interactive segmentation. Experimental results show that SAM's performance based on single prompts highly varies depending on the task and the dataset, i.e., from 0.1135 for a spine MRI dataset to 0.8650 for a hip x-ray dataset, evaluated by IoU. Performance appears to be high for tasks including well-circumscribed objects with unambiguous prompts and poorer in many other scenarios such as segmentation of tumors. When multiple prompts are provided, performance improves only slightly overall, but more so for datasets where the object is not contiguous. An additional comparison to RITM showed a much better performance of SAM for one prompt but a similar performance of the two methods for a larger number of prompts. We conclude that SAM shows impressive performance for some datasets given the zero-shot learning setup but poor to moderate performance for multiple other datasets. While SAM as a model and as a learning paradigm might be impactful in the medical imaging domain, extensive research is needed to identify the proper ways of adapting it in this domain.
翻译:由于医学图像标注数据获取困难且成本高昂,训练医学图像分割模型仍面临挑战。Segment Anything Model (SAM) 是一个基于超过10亿个标注(主要针对自然图像)训练的 foundation 模型,旨在以交互方式分割用户指定的目标对象。尽管 SAM 在自然图像上表现优异,但其在医学图像领域的迁移效果尚不明确。为此,我们对 SAM 在11个涵盖不同成像模态和解剖部位的医学图像数据集上的分割能力进行了全面评估。实验中,我们采用模拟交互分割的标准方法生成点提示。结果表明:基于单一提示时,SAM 的性能因任务和数据集而异,在 IoU 指标上从脊柱 MRI 数据集的 0.1135 到髋部 X 光数据集的 0.8650 不等。对于边界清晰且提示明确的物体(如分割肿瘤的场景),其性能较高;但在许多其他场景中表现较差。当提供多个提示时,整体性能仅略有提升,但对于物体不连续的数据集提升更显著。与 RITM 的对比实验显示:使用单一提示时 SAM 性能显著优于 RITM,但增加提示数量后两者性能相当。我们得出结论:在零样本学习设置下,SAM 对部分数据集表现出令人印象深刻的分割性能,但对多个其他数据集表现欠佳甚至中等。尽管 SAM 作为模型和学习范式可能在医学影像领域具有影响力,但需开展大量研究以确定其在该领域的适配方式。