Region-CAM：面向弱监督学习任务中类激活映射的精确目标区域定位 (Region-CAM: Towards Accurate Object Regions in Class Activation Maps for Weakly Supervised Learning Tasks)

Class Activation Mapping (CAM) methods are widely applied in weakly supervised learning tasks due to their ability to highlight object regions. However, conventional CAM methods highlight only the most discriminative regions of the target. These highlighted regions often fail to cover the entire object and are frequently misaligned with object boundaries, thereby limiting the performance of downstream weakly supervised learning tasks, particularly Weakly Supervised Semantic Segmentation (WSSS), which demands pixel-wise accurate activation maps to get the best results. To alleviate the above problems, we propose a novel activation method, Region-CAM. Distinct from network feature weighting approaches, Region-CAM generates activation maps by extracting semantic information maps (SIMs) and performing semantic information propagation (SIP) by considering both gradients and features in each of the stages of the baseline classification model. Our approach highlights a greater proportion of object regions while ensuring activation maps to have precise boundaries that align closely with object edges. Region-CAM achieves 60.12% and 58.43% mean intersection over union (mIoU) using the baseline model on the PASCAL VOC training and validation datasets, respectively, which are improvements of 13.61% and 13.13% over the original CAM (46.51% and 45.30%). On the MS COCO validation set, Region-CAM achieves 36.38%, a 16.23% improvement over the original CAM (20.15%). We also demonstrate the superiority of Region-CAM in object localization tasks, using the ILSVRC2012 validation set. Region-CAM achieves 51.7% in Top-1 Localization accuracy Loc1. Compared with LayerCAM, an activation method designed for weakly supervised object localization, Region-CAM achieves 4.5% better performance in Loc1.

翻译：类激活映射（CAM）方法因其能够突出显示目标区域，在弱监督学习任务中得到广泛应用。然而，传统的CAM方法仅能突出目标最具判别性的区域。这些突出区域往往无法覆盖整个目标，且常与目标边界错位，从而限制了后续弱监督学习任务的性能，特别是需要像素级精确激活图以获得最佳结果的弱监督语义分割（WSSS）。为缓解上述问题，我们提出了一种新颖的激活方法——Region-CAM。与网络特征加权方法不同，Region-CAM通过提取语义信息图（SIMs）并考虑基线分类模型各阶段中的梯度和特征，执行语义信息传播（SIP）来生成激活图。我们的方法在确保激活图具有精确边界且与目标边缘紧密对齐的同时，突出了更大比例的目标区域。在PASCAL VOC训练和验证数据集上，Region-CAM使用基线模型分别实现了60.12%和58.43%的平均交并比（mIoU），较原始CAM（46.51%和45.30%）提升了13.61%和13.13%。在MS COCO验证集上，Region-CAM达到36.38%，较原始CAM（20.15%）提升了16.23%。我们还通过ILSVRC2012验证集展示了Region-CAM在目标定位任务中的优越性：Region-CAM在Top-1定位准确率（Loc1）上达到51.7%。与专为弱监督目标定位设计的激活方法LayerCAM相比，Region-CAM在Loc1指标上实现了4.5%的性能提升。