Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion

In medical imaging, precise annotation of lesions or organs is often required. However, 3D volumetric images typically consist of hundreds or thousands of slices, making the annotation process extremely time-consuming and laborious. Recently, the Segment Anything Model (SAM) has drawn widespread attention due to its remarkable zero-shot generalization capabilities in interactive segmentation. While researchers have explored adapting SAM for medical applications, such as using SAM adapters or constructing 3D SAM models, a key question remains: Can traditional CNN networks achieve the same strong zero-shot generalization in this task? In this paper, we propose the Lightweight Interactive Network for 3D Medical Image Segmentation (LIM-Net), a novel approach demonstrating the potential of compact CNN-based models. Built upon a 2D CNN backbone, LIM-Net initiates segmentation by generating a 2D prompt mask from user hints. This mask is then propagated through the 3D sequence via the Memory Module. To refine and stabilize results during interaction, the Multi-Round Result Fusion (MRF) Module selects and merges optimal masks from multiple rounds. Our extensive experiments across multiple datasets and modalities demonstrate LIM-Net's competitive performance. It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions. Notably, LIM-Net's lightweight design offers significant advantages in deployment and inference efficiency, with low GPU memory consumption suitable for resource-constrained environments. These promising results demonstrate LIM-Net can serve as a strong baseline, complementing and contrasting with popular SAM models to further boost effective interactive medical image segmentation. The code will be released at \url{https://github.com/goodtime-123/LIM-Net}.

翻译：在医学影像分析中，常需对病灶或器官进行精确标注。然而，三维体数据通常包含数百甚至数千张切片，使得标注过程极为耗时费力。近期，Segment Anything Model（SAM）凭借其在交互式分割任务中卓越的零样本泛化能力受到广泛关注。尽管已有研究探索将SAM应用于医学领域，例如采用SAM适配器或构建三维SAM模型，但一个关键问题仍然存在：传统的CNN网络能否在此任务中实现同样强大的零样本泛化？本文提出了一种轻量化交互式三维医学图像分割网络（LIM-Net），该方法展示了基于紧凑CNN模型的潜力。LIM-Net以二维CNN骨干网络为基础，首先通过用户提示生成二维提示掩码来启动分割过程，随后通过记忆模块将该掩码沿三维序列传播。为了在交互过程中优化并稳定结果，多轮结果融合模块（MRF）从多轮交互中选择并融合最优掩码。我们在多个数据集和模态上的广泛实验表明，LIM-Net具有竞争力的性能。与基于SAM的模型相比，它在未见数据上展现出更强的泛化能力，在保证精度相当的同时需要更少的交互次数。值得注意的是，LIM-Net的轻量化设计在部署和推理效率方面具有显著优势，其较低的GPU内存消耗使其适用于资源受限的环境。这些令人鼓舞的结果表明，LIM-Net可作为一个强有力的基准模型，与当前流行的SAM模型形成互补与对比，共同推动高效交互式医学图像分割的发展。代码将发布于 \url{https://github.com/goodtime-123/LIM-Net}。