In this paper, we present SonoSAMTrack - that combines a promptable foundational model for segmenting objects of interest on ultrasound images called SonoSAM, with a state-of-the art contour tracking model to propagate segmentations on 2D+t and 3D ultrasound datasets. Fine-tuned and tested exclusively on a rich, diverse set of objects from $\approx200$k ultrasound image-mask pairs, SonoSAM demonstrates state-of-the-art performance on 7 unseen ultrasound data-sets, outperforming competing methods by a significant margin. We also extend SonoSAM to 2-D +t applications and demonstrate superior performance making it a valuable tool for generating dense annotations and segmentation of anatomical structures in clinical workflows. Further, to increase practical utility of the work, we propose a two-step process of fine-tuning followed by knowledge distillation to a smaller footprint model without comprising the performance. We present detailed qualitative and quantitative comparisons of SonoSAM with state-of-the-art methods showcasing efficacy of the method. This is followed by demonstrating the reduction in number of clicks in a dense video annotation problem of adult cardiac ultrasound chamber segmentation using SonoSAMTrack.
翻译:本文提出SonoSAMTrack框架,该框架结合了名为SonoSAM的超声图像感兴趣目标可提示基础分割模型,以及用于在2D+t和3D超声数据集上传播分割结果的先进轮廓追踪模型。基于约20万组超声图像-掩膜对构成的丰富多样数据集进行微调与测试,SonoSAM在7个未见过的超声数据集上均展现出最先进的性能,显著优于现有竞争方法。我们还将SonoSAM扩展至2D+t应用场景,验证其卓越表现,使其成为临床工作流中解剖结构密集标注与分割的有力工具。此外,为提升实际应用价值,我们提出两步流程:先进行微调,再通过知识蒸馏将模型压缩至更小体积,且不牺牲性能。通过定性与定量分析,本文详细比较了SonoSAM与先进方法的优势,证明其有效性。最后,我们展示了在成人心脏超声腔室分割的密集视频标注任务中,SonoSAMTrack如何显著减少标注所需的点击次数。