Segment Anything Model for Medical Images?

Yuhao Huang,Xin Yang,Lian Liu,Han Zhou,Ao Chang,Xinrui Zhou,Rusi Chen,Junxuan Yu,Jiongquan Chen,Chaoyu Chen,Sijing Liu,Haozhe Chi,Xindi Hu,Kejuan Yue,Lei Li,Vicente Grau,Deng-Ping Fan,Fajin Dong,Dong Ni

from arxiv, Accepted by Medical Image Analysis. 23 pages, 18 figures, 8 tables

The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging because of the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. To fully validate SAM's performance on medical data, we collected and sorted 53 open-source datasets and built a large medical segmentation dataset with 18 modalities, 84 objects, 125 object-modality paired targets, 1050K 2D images, and 6033K masks. We comprehensively analyzed different models and strategies on the so-called COSMOS 1050K dataset. Our findings mainly include the following: 1) SAM showed remarkable performance in some specific objects but was unstable, imperfect, or even totally failed in other situations. 2) SAM with the large ViT-H showed better overall performance than that with the small ViT-B. 3) SAM performed better with manual hints, especially box, than the Everything mode. 4) SAM could help human annotation with high labeling quality and less time. 5) SAM was sensitive to the randomness in the center point and tight box prompts, and may suffer from a serious performance drop. 6) SAM performed better than interactive methods with one or a few points, but will be outpaced as the number of points increases. 7) SAM's performance correlated to different factors, including boundary complexity, intensity differences, etc. 8) Finetuning the SAM on specific medical tasks could improve its average DICE performance by 4.39% and 6.68% for ViT-B and ViT-H, respectively. We hope that this comprehensive report can help researchers explore the potential of SAM applications in MIS, and guide how to appropriately use and develop SAM.

翻译：Segment Anything 模型（SAM）是首个通用图像分割的基础模型，在多种自然图像分割任务中取得了显著成果。然而，医学图像分割因模态复杂、解剖结构精细、目标边界不确定且复杂、目标尺度跨度大而更具挑战性。为全面验证SAM在医学数据上的表现，我们收集整理了53个开源数据集，构建了一个包含18种模态、84类目标、125个目标-模态配对、1050K张二维图像与6033K个掩膜的大型医学分割数据集（COSMOS 1050K）。我们在该数据集上系统分析了不同模型与策略的效果，主要发现包括：1）SAM在特定目标上表现卓越，但在其他场景中不稳定、不完善甚至完全失效；2）采用大ViT-H的SAM整体表现优于小ViT-B；3）在人工提示（尤其是矩形框）辅助下，SAM性能优于全自动模式；4）SAM能以更少时间实现高标注质量，辅助人工标注；5）SAM对中心点与紧密矩形框提示的随机性敏感，可能导致性能严重下降；6）SAM在单点或少量点提示下优于交互式方法，但随点数增加被反超；7）SAM性能与边界复杂度、强度差异等因素相关；8）针对特定医学任务微调SAM后，ViT-B与ViT-H的平均DICE分别提升4.39%与6.68%。本综合报告旨在帮助研究者探索SAM在医学图像分割中的潜力，并指导其合理应用与开发。