The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's performance on medical data, we collected and sorted 53 open-source datasets and built a large medical segmentation dataset with 18 modalities, 84 objects, 125 object-modality paired targets, 1050K 2D images, and 6033K masks. We comprehensively analyzed different models and strategies on the so-called COSMOS 1050K dataset. Our findings mainly include the following: 1) SAM showed remarkable performance in some specific objects but was unstable, imperfect, or even totally failed in other situations. 2) SAM with the large ViT-H showed better overall performance than that with the small ViT-B. 3) SAM performed better with manual hints, especially box, than the Everything mode. 4) SAM could help human annotation with high labeling quality and less time. 5) SAM was sensitive to the randomness in the center point and tight box prompts, and may suffer from a serious performance drop. 6) SAM performed better than interactive methods with one or a few points, but will be outpaced as the number of points increases. 7) SAM's performance correlated to different factors, including boundary complexity, intensity differences, etc. 8) Finetuning the SAM on specific medical tasks could improve its average DICE performance by 4.39% and 6.68% for ViT-B and ViT-H, respectively. We hope that this comprehensive report can help researchers explore the potential of SAM applications in MIS, and guide how to appropriately use and develop SAM.
翻译:Segment Anything模型(SAM)是首个面向通用图像分割的基础模型,在各类自然图像分割任务中取得了显著成效。然而,医学图像分割因模态复杂、解剖结构精细、目标边界不确定且复杂、目标尺度跨度大而更具挑战性。为全面验证SAM在医学数据上的表现,我们收集并整理了53个开源数据集,构建了一个包含18种模态、84个目标、125个目标-模态配对对象、1050K张二维图像及6033K个掩膜的大型医学分割数据集——COSMOS 1050K。基于此数据集,我们系统分析了不同模型与策略的表现,主要发现如下:1)SAM在某些特定目标上表现卓越,但在其他场景下存在不稳定性、不完整性甚至完全失效的情况;2)采用大型ViT-H的SAM性能整体优于小型ViT-B;3)SAM在人工提示(尤其是边界框提示)下的表现优于"全自动模式";4)SAM能够辅助人工标注,在保持高标注质量的同时缩短标注时间;5)SAM对中心点与紧密边界框提示的随机性敏感,可能出现严重性能下降;6)在单点或少量点提示时SAM优于交互式方法,但随着点数量增加其优势将被超越;7)SAM的性能受边界复杂度、强度差异等多种因素影响;8)在特定医学任务上对SAM进行微调可使ViT-B与ViT-H的平均DICE评分分别提升4.39%和6.68%。我们希望这份全面报告能帮助研究者探索SAM在医学图像分割中的应用潜力,并指导如何合理使用与开发SAM。