Few-shot semantic segmentation (FSS) aims to form class-agnostic models segmenting unseen classes with only a handful of annotations. Previous methods limited to the semantic feature and prototype representation suffer from coarse segmentation granularity and train-set overfitting. In this work, we design Hierarchically Decoupled Matching Network (HDMNet) mining pixel-level support correlation based on the transformer architecture. The self-attention modules are used to assist in establishing hierarchical dense features, as a means to accomplish the cascade matching between query and support features. Moreover, we propose a matching module to reduce train-set overfitting and introduce correlation distillation leveraging semantic correspondence from coarse resolution to boost fine-grained segmentation. Our method performs decently in experiments. We achieve 50.0% mIoU on COCO dataset one-shot setting and 56.0% on five-shot segmentation, respectively. The code will be available on the project website. We hope our work can benefit broader industrial applications where novel classes with limited annotations are required to be decently identified.
翻译:少样本语义分割旨在构建类别无关的模型,使其仅通过少量标注即可分割未见类别。先前方法局限于语义特征与原型表示,存在分割粒度粗糙及训练集过拟合问题。本文基于Transformer架构设计层次化解耦匹配网络(HDMNet),挖掘像素级支持关联。通过自注意力模块辅助建立层次化密集特征,实现查询特征与支持特征之间的级联匹配。此外,我们提出匹配模块以缓解训练集过拟合,并引入关联蒸馏机制,利用粗分辨率下的语义对应性提升细粒度分割性能。实验结果表明,本方法在COCO数据集上,单次分割(one-shot)与五次分割(five-shot)的平均交并比(mIoU)分别为50.0%与56.0%。代码将在项目网站公开。期望本研究能推动需对少量标注新类别进行精准识别的广泛工业应用。