In medical imaging, precise annotation of lesions or organs is often required. However, 3D volumetric images typically consist of hundreds or thousands of slices, making the annotation process extremely time-consuming and laborious. Recently, the Segment Anything Model (SAM) has drawn widespread attention due to its remarkable zero-shot generalization capabilities in interactive segmentation. While researchers have explored adapting SAM for medical applications, such as using SAM adapters or constructing 3D SAM models, a key question remains: Can traditional CNN networks achieve the same strong zero-shot generalization in this task? In this paper, we propose the Lightweight Interactive Network for 3D Medical Image Segmentation (LIM-Net), a novel approach demonstrating the potential of compact CNN-based models. Built upon a 2D CNN backbone, LIM-Net initiates segmentation by generating a 2D prompt mask from user hints. This mask is then propagated through the 3D sequence via the Memory Module. To refine and stabilize results during interaction, the Multi-Round Result Fusion (MRF) Module selects and merges optimal masks from multiple rounds. Our extensive experiments across multiple datasets and modalities demonstrate LIM-Net's competitive performance. It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions. Notably, LIM-Net's lightweight design offers significant advantages in deployment and inference efficiency, with low GPU memory consumption suitable for resource-constrained environments. These promising results demonstrate LIM-Net can serve as a strong baseline, complementing and contrasting with popular SAM models to further boost effective interactive medical image segmentation. The code will be released at \url{https://github.com/goodtime-123/LIM-Net}.
翻译:在医学影像分析中,常需对病灶或器官进行精确标注。然而,三维体数据通常包含数百甚至数千张切片,使得标注过程极为耗时费力。近期,Segment Anything Model(SAM)凭借其在交互式分割任务中卓越的零样本泛化能力受到广泛关注。尽管已有研究探索将SAM应用于医学领域,例如采用SAM适配器或构建三维SAM模型,但一个关键问题仍然存在:传统的CNN网络能否在此任务中实现同样强大的零样本泛化?本文提出了一种轻量化交互式三维医学图像分割网络(LIM-Net),该方法展示了基于紧凑CNN模型的潜力。LIM-Net以二维CNN骨干网络为基础,首先通过用户提示生成二维提示掩码来启动分割过程,随后通过记忆模块将该掩码沿三维序列传播。为了在交互过程中优化并稳定结果,多轮结果融合模块(MRF)从多轮交互中选择并融合最优掩码。我们在多个数据集和模态上的广泛实验表明,LIM-Net具有竞争力的性能。与基于SAM的模型相比,它在未见数据上展现出更强的泛化能力,在保证精度相当的同时需要更少的交互次数。值得注意的是,LIM-Net的轻量化设计在部署和推理效率方面具有显著优势,其较低的GPU内存消耗使其适用于资源受限的环境。这些令人鼓舞的结果表明,LIM-Net可作为一个强有力的基准模型,与当前流行的SAM模型形成互补与对比,共同推动高效交互式医学图像分割的发展。代码将发布于 \url{https://github.com/goodtime-123/LIM-Net}。