3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation

Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and not stable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation. We also compare our adaptation method with existing popular adapters, and observed significant performance improvement on most datasets.

翻译：尽管分割一切模型（SAM）在通用语义分割任务上取得了令人瞩目的成果，并在日常图像上展现出强大的泛化能力，但其在医学图像分割方面的表现精度较低且不稳定，尤其是在处理涉及小尺寸、不规则形状和低对比度目标的肿瘤分割任务时。值得注意的是，原始SAM架构专为二维自然图像设计，因此无法有效从三维体数据中提取空间信息。本文提出一种新颖的适配方法，将SAM从二维迁移至三维以实现可提示的医学图像分割。通过整体设计的架构修改方案，我们使SAM能够支持三维体数据输入，同时保留其大部分预训练参数以供复用。微调过程以参数高效的方式进行，其中大部分预训练参数保持冻结，仅引入并优化少量轻量级空间适配器。尽管自然图像与医学数据之间存在领域差异，且二维与三维空间结构存在显著差异，但经过轻量级适配后，在自然图像上训练的Transformer模型仍能有效捕捉三维医学图像中的空间模式。我们在四个开源肿瘤分割数据集上进行实验，结果表明：通过单点提示，我们的模型在四项任务中的三项上优于当前最先进的医学图像分割模型，具体在肾脏肿瘤、胰腺肿瘤和结肠癌分割任务上分别提升8.25%、29.87%和10.11%，在肝脏肿瘤分割任务上达到相当性能。我们还将本适配方法与现有主流适配器进行比较，在大多数数据集上观察到显著的性能提升。