Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, including medical imaging, and hold a lot of promise for streamlining the annotation process. However, SAM has yet to be evaluated in a crowd-sourced setting to curate annotations for training 3D DL segmentation models. In this work, we explore the potential of SAM for crowd-sourcing "sparse" annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model. Our results indicate that while SAM-generated annotations exhibit high mean Dice scores compared to ground-truth annotations, nnU-Net models trained on SAM-generated annotations perform significantly worse than nnU-Net models trained on ground-truth annotations ($p<0.001$, all).
翻译:为医学图像分割整理标注是一项劳动密集且耗时的任务,需要领域专业知识,这导致深度学习(DL)模型仅能“狭窄”地聚焦于特定领域,翻译实用性有限。近期,诸如分段任意模型(SAM)的基础模型凭借其在包括医学成像在内的各个领域中卓越的零样本泛化能力,彻底革新了语义分割,并为简化标注流程带来了巨大前景。然而,SAM尚未在众包场景下接受评估,以整理用于训练三维DL分割模型的标注。本研究探索了SAM从非专家处众包“稀疏”标注以生成用于训练先进DL分割模型nnU-Net的“密集”分割掩膜的潜力。我们的结果表明,尽管与真实标注相比,SAM生成的标注表现出较高的平均Dice得分,但基于SAM标注训练的nnU-Net模型性能显著差于基于真实标注训练的nnU-Net模型(所有指标$p<0.001$)。