3DSAM-adapter: Holistic Adaptation of SAM from 2D to 3D for Promptable Medical Image Segmentation

Despite that the segment anything model (SAM) achieved impressive results on general-purpose semantic segmentation with strong generalization ability on daily images, its demonstrated performance on medical image segmentation is less precise and not stable, especially when dealing with tumor segmentation tasks that involve objects of small sizes, irregular shapes, and low contrast. Notably, the original SAM architecture is designed for 2D natural images, therefore would not be able to extract the 3D spatial information from volumetric medical data effectively. In this paper, we propose a novel adaptation method for transferring SAM from 2D to 3D for promptable medical image segmentation. Through a holistically designed scheme for architecture modification, we transfer the SAM to support volumetric inputs while retaining the majority of its pre-trained parameters for reuse. The fine-tuning process is conducted in a parameter-efficient manner, wherein most of the pre-trained parameters remain frozen, and only a few lightweight spatial adapters are introduced and tuned. Regardless of the domain gap between natural and medical data and the disparity in the spatial arrangement between 2D and 3D, the transformer trained on natural images can effectively capture the spatial patterns present in volumetric medical images with only lightweight adaptations. We conduct experiments on four open-source tumor segmentation datasets, and with a single click prompt, our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation. We also compare our adaptation method with existing popular adapters, and observed significant performance improvement on most datasets.

翻译：尽管通用语义分割模型SAM（Segment Anything Model）在日常图像上展现出强大的泛化能力，但在医学图像分割任务中，特别是在处理小尺寸、不规则形状及低对比度目标的肿瘤分割时，其性能表现不够精确且不稳定。值得注意的是，原始SAM架构专为二维自然图像设计，因此无法有效提取三维体数据中的空间信息。本文提出一种新颖的适配方法，将SAM从2D迁移至3D，实现可提示式医学图像分割。通过整体设计的架构修改方案，我们在保留大部分预训练参数以供复用的前提下，将SAM改造为支持体数据输入。微调过程采用参数高效方式，其中大部分预训练参数保持冻结，仅引入并调整少量轻量级空间适配器。无论自然与医学数据间的领域差异，还是2D与3D空间排布的不对称性，基于自然图像训练的Transformer仅需轻量级适配即可有效捕获医学体图像中的空间模式。我们在四个开源肿瘤分割数据集上进行实验，采用单次点击提示后，模型在四项任务中三项超越领域最优医学图像分割模型：肾脏肿瘤分割提升8.25%，胰腺肿瘤分割提升29.87%，结肠癌分割提升10.11%，并在肝脏肿瘤分割中取得相似性能。与现有主流适配器对比，我们的方法在大多数数据集上展现出显著性能提升。