Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal for medical imaging with both conditions invalid. To address this problem, we propose a novel Detail Self-refined Prototype Network (DSPNet) to constructing high-fidelity prototypes representing the object foreground and the background more comprehensively. Specifically, to construct global semantics while maintaining the captured detail semantics, we learn the foreground prototypes by modelling the multi-modal structures with clustering and then fusing each in a channel-wise manner. Considering that the background often has no apparent semantic relation in the spatial dimensions, we integrate channel-specific structural information under sparse channel-aware regulation. Extensive experiments on three challenging medical image benchmarks show the superiority of DSPNet over previous state-of-the-art methods.
翻译:少样本语义分割(FSS)旨在利用每个类别仅有的单个标注训练样本,将预训练模型适配到新类别。尽管基于原型的方法已取得显著成功,但现有模型仅限于物体区分明显且背景复杂度不高的成像场景(例如自然图像)。这使得此类模型在医学成像中(上述两个条件均不满足)表现欠佳。为解决此问题,我们提出一种新颖的细节自精炼原型网络(DSPNet),以构建能更全面表征目标前景与背景的高保真原型。具体而言,为在构建全局语义的同时保持已捕获的细节语义,我们通过聚类建模多模态结构并以通道级方式融合各模态来学习前景原型。考虑到背景在空间维度上通常缺乏明显的语义关联,我们在稀疏通道感知调控下整合通道特定的结构信息。在三个具有挑战性的医学图像基准数据集上的大量实验表明,DSPNet 优于以往最先进的方法。