Under the backdrop of large-scale pre-training, large visual models (LVM) have demonstrated significant potential in image understanding. The recent emergence of the Segment Anything Model (SAM) has brought a qualitative shift in the field of image segmentation, supporting flexible interactive cues and strong learning capabilities. However, its performance often falls short in cross-domain and few-shot applications. Transferring prior knowledge from foundation models to new applications while preserving learning capabilities is worth exploring. This work proposes a task-adaptive prompt framework based on SAM, a new paradigm for Cross-dominan few-shot segmentation (CD-FSS). First, a Multi-level Feature Fusion (MFF) was used for integrated feature extraction. Besides, an additional Class Domain Task-Adaptive Auto-Prompt (CDTAP) module was combined with the segmentation branch for class-domain agnostic feature extraction and high-quality learnable prompt production. This significant advancement uses a unique generative approach to prompts alongside a comprehensive model structure and specialized prototype computation. While ensuring that the prior knowledge of SAM is not discarded, the new branch disentangles category and domain information through prototypes, guiding it in adapting the CD-FSS. We have achieved the best results on three benchmarks compared to the recent state-of-the-art (SOTA) methods. Comprehensive experiments showed that after task-specific and weighted guidance, the abundant feature information of SAM can be better learned for CD-FSS.
翻译:在大规模预训练的背景下,大型视觉模型(LVM)在图像理解方面展现出巨大潜力。近期出现的Segment Anything Model(SAM)为图像分割领域带来了质的飞跃,其支持灵活的交互式提示并具备强大的学习能力。然而,在跨域与少样本场景下,其性能往往不尽如人意。如何将基础模型中的先验知识迁移至新应用,同时保持其学习能力,是一个值得探索的方向。本文提出了一种基于SAM的任务自适应提示框架,为跨域少样本分割(CD-FSS)提供了一种新范式。首先,我们采用多级特征融合(MFF)进行集成特征提取。此外,引入了一个额外的类域任务自适应自动提示(CDTAP)模块,与分割分支相结合,以实现类域无关的特征提取并生成高质量的可学习提示。这一重要进展通过独特的提示生成方法、完整的模型结构以及专门的原型计算来实现。在确保不丢弃SAM先验知识的前提下,新分支通过原型解耦类别与域信息,引导模型适应CD-FSS任务。我们在三个基准测试上取得了优于当前最先进(SOTA)方法的最佳结果。综合实验表明,经过任务特定且加权的引导后,SAM丰富的特征信息能够更好地被学习并应用于CD-FSS。