Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset show that our proposed add-on is well-suited for large-scale settings. Our code is available at https://github.com/arvijj/hfpl.
翻译:图像级弱监督语义分割通过在训练中使用替代分割掩码,大幅降低了通常高昂的数据标注成本。典型方法涉及使用卷积特征图上的全局平均池化训练图像分类网络,从而基于类激活图估计物体位置——CAM能够标识图像区域的重要性。随后,CAM被用于生成分割掩码形式的伪标签,以便在缺乏像素级真实标注的情况下监督分割模型。我们的工作基于两种改进CAM的技术:作为GAP替代方案的重要性采样,以及利用物体轮廓几乎始终与图像颜色边缘对齐这一启发式知识的特征相似性损失。然而,这两种方法均基于含softmax的多项式后验分布,并隐含假设类别互斥——这一假设在我们的实验中表现欠佳。因此,我们基于多个独立二分类问题的二项式后验对两种技术进行了重构。这带来了双重优势:既提升了性能,又增强了通用性,最终形成一种可增强几乎所有WSSS方法的即插即用模块。在PASCAL VOC数据集上的大量基线实验中,该方法显著提升了所有已实现最先进方法的区域相似度与轮廓质量。MS COCO数据集上的实验表明,我们提出的即插即用模块尤其适合大规模场景。我们的代码已开源至https://github.com/arvijj/hfpl。