Foundation models such as the recently introduced Segment Anything Model (SAM) have achieved remarkable results in image segmentation tasks. However, these models typically require user interaction through handcrafted prompts such as bounding boxes, which limits their deployment to downstream tasks. Adapting these models to a specific task with fully labeled data also demands expensive prior user interaction to obtain ground-truth annotations. This work proposes to replace conditioning on input prompts with a lightweight module that directly learns a prompt embedding from the image embedding, both of which are subsequently used by the foundation model to output a segmentation mask. Our foundation models with learnable prompts can automatically segment any specific region by 1) modifying the input through a prompt embedding predicted by a simple module, and 2) using weak labels (tight bounding boxes) and few-shot supervision (10 samples). Our approach is validated on MedSAM, a version of SAM fine-tuned for medical images, with results on three medical datasets in MR and ultrasound imaging. Our code is available on https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation.
翻译:近期提出的Segment Anything Model (SAM)等基础模型在图像分割任务中取得了显著成果。然而,这些模型通常需要通过手工提示(如边界框)进行用户交互,这限制了其在下游任务中的部署。使用全标注数据将这些模型适配到特定任务同样需要昂贵的先验用户交互以获取真实标注。本研究提出用轻量级模块替代输入提示的条件化处理,该模块直接从图像嵌入中学习提示嵌入,二者随后被基础模型用于输出分割掩码。我们提出的可学习提示基础模型能够通过以下方式自动分割任意特定区域:1)通过简单模块预测的提示嵌入修改输入;2)使用弱标签(紧凑边界框)和少样本监督(10个样本)。我们的方法在针对医学图像微调的SAM版本MedSAM上进行了验证,并在磁共振与超声成像的三个医学数据集中取得了实验结果。代码发布于https://github.com/Minimel/MedSAMWeakFewShotPromptAutomation。