Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
翻译:少样本语义分割(FSS)旨在仅用少量标注样本分割未见类别。当前FSS方法通常基于训练与应用场景处于相似领域的假设,当应用于差异显著的领域时,其性能会大幅下降。为此,我们提出利用前沿基础模型——Segment Anything Model(SAM)来增强泛化能力。然而,SAM在与其训练数据(主要为自然场景图像)差异较大的领域上表现欠佳,且因其交互式提示机制无法支持特定语义的自动分割。本工作中,我们提出APSeg,一种面向跨域少样本语义分割(CD-FSS)的新型自动提示网络,该网络设计为可自动生成提示以指导跨域分割。具体而言,我们提出一种双重原型锚点变换(DPAT)模块,该模块将基于循环一致性提取的伪查询原型与支持原型相融合,使特征能够变换至更稳定的领域无关空间。此外,我们引入元提示生成器(MPG)模块以自动生成提示嵌入,从而无需人工视觉提示。我们构建了一个高效模型,可直接应用于目标领域而无需微调。在四个跨域数据集上的大量实验表明,我们的模型在1样本和5样本设置下的平均准确率分别优于当前最先进的CD-FSS方法5.24%和3.10%。