MIANet: Aggregating Unbiased Instance and General Information for Few-Shot Semantic Segmentation

Existing few-shot segmentation methods are based on the meta-learning strategy and extract instance knowledge from a support set and then apply the knowledge to segment target objects in a query set. However, the extracted knowledge is insufficient to cope with the variable intra-class differences since the knowledge is obtained from a few samples in the support set. To address the problem, we propose a multi-information aggregation network (MIANet) that effectively leverages the general knowledge, i.e., semantic word embeddings, and instance information for accurate segmentation. Specifically, in MIANet, a general information module (GIM) is proposed to extract a general class prototype from word embeddings as a supplement to instance information. To this end, we design a triplet loss that treats the general class prototype as an anchor and samples positive-negative pairs from local features in the support set. The calculated triplet loss can transfer semantic similarities among language identities from a word embedding space to a visual representation space. To alleviate the model biasing towards the seen training classes and to obtain multi-scale information, we then introduce a non-parametric hierarchical prior module (HPM) to generate unbiased instance-level information via calculating the pixel-level similarity between the support and query image features. Finally, an information fusion module (IFM) combines the general and instance information to make predictions for the query image. Extensive experiments on PASCAL-5i and COCO-20i show that MIANet yields superior performance and set a new state-of-the-art. Code is available at https://github.com/Aldrich2y/MIANet.

翻译：现有少样本分割方法基于元学习策略，从支持集中提取实例知识，并将该知识应用于查询集中的目标对象分割。然而，由于知识来源于支持集中的少量样本，提取的知识不足以应对类内差异的多变性。针对该问题，我们提出了一种多信息聚合网络（MIANet），该网络有效利用通用知识（即语义词嵌入）与实例信息，实现精准分割。具体而言，MIANet中设计了通用信息模块（GIM），从词嵌入中提取通用类原型作为实例信息的补充。为此，我们设计了一种三元组损失，将通用类原型作为锚点，并从支持集的局部特征中采样正负样本对。计算得到的三元组损失可将语言标识间的语义相似性从词嵌入空间迁移至视觉表示空间。为缓解模型对已见训练类别的偏差并获取多尺度信息，我们进一步引入非参数化层次先验模块（HPM），通过计算支持集与查询集图像特征的像素级相似性，生成无偏的实例级信息。最后，信息融合模块（IFM）融合通用信息与实例信息，对查询图像进行预测。在PASCAL-5i和COCO-20i上的大量实验表明，MIANet取得了优越性能并达到了新的最优水平。代码见https://github.com/Aldrich2y/MIANet。