Weakly-supervised learning approaches have gained significant attention due to their ability to reduce the effort required for human annotations in training neural networks. This paper investigates a framework for weakly-supervised object localization, which aims to train a neural network capable of predicting both the object class and its location using only images and their image-level class labels. The proposed framework consists of a shared feature extractor, a classifier, and a localizer. The localizer predicts pixel-level class probabilities, while the classifier predicts the object class at the image level. Since image-level class labels are insufficient for training the localizer, weakly-supervised object localization methods often encounter challenges in accurately localizing the entire object region. To address this issue, the proposed method incorporates adversarial erasing and pseudo labels to improve localization accuracy. Specifically, novel losses are designed to utilize adversarially erased foreground features and adversarially erased feature maps, reducing dependence on the most discriminative region. Additionally, the proposed method employs pseudo labels to suppress activation values in the background while increasing them in the foreground. The proposed method is applied to two backbone networks (MobileNetV1 and InceptionV3) and is evaluated on three publicly available datasets (ILSVRC-2012, CUB-200-2011, and PASCAL VOC 2012). The experimental results demonstrate that the proposed method outperforms previous state-of-the-art methods across all evaluated metrics.
翻译:弱监督学习方法因其能减少训练神经网络时人工标注的工作量而受到广泛关注。本文研究了一种弱监督目标定位框架,旨在仅利用图像及其图像级类别标签训练一个能同时预测物体类别和位置的神经网络。该框架由共享特征提取器、分类器和定位器组成。定位器预测像素级类别概率,而分类器则预测图像级的物体类别。由于图像级类别标签不足以训练定位器,弱监督目标定位方法在准确定位整个物体区域时常常面临挑战。为解决这一问题,本文方法引入了对抗擦除与伪标签来提升定位精度。具体而言,设计了新的损失函数,利用对抗擦除的前景特征及对抗擦除的特征图,减少对最具判别性区域的依赖。此外,该方法采用伪标签抑制背景区域的激活值,同时增强前景区域的激活值。所提方法应用于两种主干网络(MobileNetV1和InceptionV3),并在三个公开数据集(ILSVRC-2012、CUB-200-2011和PASCAL VOC 2012)上进行评估。实验结果表明,该方法在所有评估指标上均优于先前最先进的方法。