Background Activation Suppression for Weakly Supervised Object Localization and Semantic Segmentation

Weakly supervised object localization and semantic segmentation aim to localize objects using only image-level labels. Recently, a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve pixel-level localization. While existing FPM-based methods use cross-entropy to evaluate the foreground prediction map and to guide the learning of the generator, this paper presents two astonishing experimental observations on the object localization learning process: For a trained network, as the foreground mask expands, 1) the cross-entropy converges to zero when the foreground mask covers only part of the object region. 2) The activation value continuously increases until the foreground mask expands to the object boundary. Therefore, to achieve a more effective localization performance, we argue for the usage of activation value to learn more object regions. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint (AMC) module is designed to facilitate the learning of generator by suppressing the background activation value. Meanwhile, by using foreground region guidance and area constraint, BAS can learn the whole region of the object. In the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets. In addition, our method also achieves state-of-the-art weakly supervised semantic segmentation performance on the PASCAL VOC 2012 and MS COCO 2014 datasets. Code and models are available at https://github.com/wpy1999/BAS-Extension.

翻译：弱监督目标定位与语义分割旨在仅利用图像级标签实现目标定位。近年来，通过生成前景预测图（FPM）实现像素级定位的新范式已出现。尽管现有基于FPM的方法使用交叉熵评估前景预测图并引导生成器的学习，但本文在目标定位学习过程中提出了两个令人惊讶的实验观察结果：对于训练好的网络，随着前景掩码的扩展，1）当前景掩码仅覆盖部分目标区域时，交叉熵收敛至零；2）激活值持续增加，直至前景掩码扩展至目标边界。因此，为获得更有效的定位性能，我们主张利用激活值学习更多目标区域。本文提出了一种背景激活抑制（BAS）方法。具体而言，设计了激活图约束（AMC）模块，通过抑制背景激活值来促进生成器的学习。同时，利用前景区域引导和面积约束，BAS可学习目标的完整区域。在推理阶段，我们联合考虑不同类别的预测图以获取最终定位结果。大量实验表明，BAS在CUB-200-2011和ILSVRC数据集上相较于基线方法取得了显著且一致的改进。此外，我们的方法在PASCAL VOC 2012和MS COCO 2014数据集上还实现了最先进的弱监督语义分割性能。代码和模型已开源至 https://github.com/wpy1999/BAS-Extension。