Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.
翻译:少样本语义分割(FSS)旨在仅用少量标注样本分割未见类别。当前FSS方法通常基于训练与应用场景处于相似领域的假设,当应用于差异显著的领域时性能会急剧下降。为此,我们提出利用前沿基础模型——Segment Anything Model(SAM)来增强泛化能力。然而,SAM在其训练数据(主要为自然场景图像)之外的领域表现欠佳,且由于其交互式提示机制无法支持特定语义的自动分割。本工作中,我们提出了APSeg,一种面向跨域少样本语义分割(CD-FSS)的新型自动提示网络,其设计目标是通过自动生成提示来引导跨域分割。具体而言,我们提出了双重原型锚点变换(DPAT)模块,该模块将基于循环一致性提取的伪查询原型与支持原型相融合,使特征能够转换至更稳定的领域无关空间。此外,我们引入了元提示生成器(MPG)模块以自动生成提示嵌入,从而免除人工视觉提示的需求。我们构建了一个无需微调即可直接应用于目标领域的高效模型。在四个跨域数据集上的大量实验表明,我们的模型在1样本和5样本设置下的平均准确率分别优于当前最先进的CD-FSS方法5.24%和3.10%。