Self-Prompting Large Vision Models for Few-Shot Medical Image Segmentation

from arxiv, 8.5 pages + 2 pages of supplementary materials + 2 pages of references, 3 figures, submitted to 5th MICCAI Workshop on Domain Adaptation and Representation Transfer (DART)

Recent advancements in large foundation models have shown promising potential in the medical industry due to their flexible prompting capability. One such model, the Segment Anything Model (SAM), a prompt-driven segmentation model, has shown remarkable performance improvements, surpassing state-of-the-art approaches in medical image segmentation. However, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. In this paper, we propose a novel perspective on self-prompting in medical vision applications. Specifically, we harness the embedding space of SAM to prompt itself through a simple yet effective linear pixel-wise classifier. By preserving the encoding capabilities of the large model, the contextual information from its decoder, and leveraging its interactive promptability, we achieve competitive results on multiple datasets (i.e. improvement of more than 15% compared to fine-tuning the mask decoder using a few images).

翻译：近期大型基础模型因其灵活的提示能力，在医疗领域展现出显著潜力。其中，基于提示驱动的分割模型Segment Anything Model（SAM）在医学图像分割中取得了优于现有技术的性能提升。然而，现有方法主要依赖需要大量数据或针对特定任务设计先验提示的调优策略，在仅有有限数据样本时尤为困难。本文提出一种面向医学视觉应用的自提示新视角：通过简单有效的线性逐像素分类器，利用SAM的嵌入空间实现自提示。在保留大型模型编码能力、解码器上下文信息并充分发挥其交互式提示性的基础上，我们在多个数据集上取得了具有竞争力的结果（例如，相比使用少量图像微调掩码解码器，性能提升超过15%）。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/