ESP-MedSAM: Efficient Self-Prompting SAM for Universal Domain-Generalized Medical Image Segmentation

Qing Xu,Jiaxuan Li,Xiangjian He,Ziyu Liu,Zhen Chen,Wenting Duan,Chenxin Li,Maggie M. He,Fiseha B. Tesema,Wooi P. Cheah,Yi Wang,Rong Qu,Jonathan M. Garibaldi

The Segment Anything Model (SAM) has demonstrated outstanding adaptation to medical image segmentation but still faces three major challenges. Firstly, the huge computational costs of SAM limit its real-world applicability. Secondly, SAM depends on manual annotations (e.g., points, boxes) as prompts, which are laborious and impractical in clinical scenarios. Thirdly, SAM handles all segmentation targets equally, which is suboptimal for diverse medical modalities with inherent heterogeneity. To address these issues, we propose an Efficient Self-Prompting SAM for universal medical image segmentation, named ESP-MedSAM. We devise a Multi-Modal Decoupled Knowledge Distillation (MMDKD) strategy to distil common image knowledge and domain-specific medical knowledge from the foundation model to train a lightweight image encoder and a modality controller. Further, they combine with the additionally introduced Self-Patch Prompt Generator (SPPG) and Query-Decoupled Modality Decoder (QDMD) to construct ESP-MedSAM. Specifically, SPPG aims to generate a set of patch prompts automatically and QDMD leverages a one-to-one strategy to provide an independent decoding channel for every modality. Extensive experiments indicate that ESP-MedSAM outperforms state-of-the-arts in diverse medical imaging segmentation takes, displaying superior zero-shot learning and modality transfer ability. Especially, our framework uses only 31.4% parameters compared to SAM-Base.

翻译：Segment Anything Model（SAM）在医学图像分割任务中展现出卓越的适应能力，但仍面临三大挑战。首先，SAM巨大的计算成本限制了其在实际场景中的应用。其次，SAM依赖手动标注（如点、框）作为提示，这在临床环境中既费力又不切实际。第三，SAM对所有分割目标采用同等处理方式，这对于具有内在异质性的多样化医学模态而言并非最优。为解决这些问题，我们提出了一种用于通用医学图像分割的高效自提示SAM，命名为ESP-MedSAM。我们设计了一种多模态解耦知识蒸馏（MMDKD）策略，从基础模型中蒸馏出通用图像知识与特定领域医学知识，以训练一个轻量级图像编码器和一个模态控制器。进一步地，它们与额外引入的自补丁提示生成器（SPPG）和查询解耦模态解码器（QDMD）相结合，共同构建ESP-MedSAM。具体而言，SPPG旨在自动生成一组补丁提示，而QDMD利用一对一策略为每种模态提供独立的解码通道。大量实验表明，ESP-MedSAM在多种医学影像分割任务中优于现有最先进方法，展现出卓越的零样本学习与模态迁移能力。特别地，与SAM-Base相比，我们的框架仅使用其31.4%的参数。