APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively.

翻译：少样本语义分割（FSS）旨在仅用少量标注样本分割未见类别。当前FSS方法通常基于训练与应用场景处于相似领域的假设，当应用于差异显著的领域时性能会急剧下降。为此，我们提出利用前沿基础模型——Segment Anything Model（SAM）来增强泛化能力。然而，SAM在其训练数据（主要为自然场景图像）之外的领域表现欠佳，且由于其交互式提示机制无法支持特定语义的自动分割。本工作中，我们提出了APSeg，一种面向跨域少样本语义分割（CD-FSS）的新型自动提示网络，其设计目标是通过自动生成提示来引导跨域分割。具体而言，我们提出了双重原型锚点变换（DPAT）模块，该模块将基于循环一致性提取的伪查询原型与支持原型相融合，使特征能够转换至更稳定的领域无关空间。此外，我们引入了元提示生成器（MPG）模块以自动生成提示嵌入，从而免除人工视觉提示的需求。我们构建了一个无需微调即可直接应用于目标领域的高效模型。在四个跨域数据集上的大量实验表明，我们的模型在1样本和5样本设置下的平均准确率分别优于当前最先进的CD-FSS方法5.24%和3.10%。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日