The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By integrating ultra-lightweight convolutional parameters into Low-Rank Adaptation (LoRA), Conv-LoRA can inject image-related inductive biases into the plain ViT encoder, further reinforcing SAM's local prior assumption. Notably, Conv-LoRA not only preserves SAM's extensive segmentation knowledge but also revives its capacity of learning high-level image semantics, which is constrained by SAM's foreground-background segmentation pretraining. Comprehensive experimentation across diverse benchmarks spanning multiple domains underscores Conv-LoRA's superiority in adapting SAM to real-world semantic segmentation tasks.
翻译:分割一切模型(SAM)作为图像分割的基础框架,在典型场景中展现出卓越的零样本泛化能力,但在医学影像和遥感等专业领域的应用中优势有所减弱。为解决这一局限,本文提出Conv-LoRA——一种简洁而高效的参数高效微调方法。通过将超轻量卷积参数融入低秩自适应(LoRA)框架,Conv-LoRA能够为纯视觉Transformer编码器注入图像相关的归纳偏置,进一步强化SAM的局部先验假设。值得注意的是,Conv-LoRA不仅保留了SAM丰富的分割知识,还恢复了其学习高层图像语义的能力(这一能力受到SAM前-背景分割预训练的制约)。跨多个领域的综合基准实验充分证明了Conv-LoRA在将SAM适配至实际语义分割任务中的优越性。