In this paper, we focus on an under-explored issue of biased activation in prior weakly-supervised object localization methods based on Class Activation Mapping (CAM). We analyze the cause of this problem from a causal view and attribute it to the co-occurring background confounders. Following this insight, we propose a novel Counterfactual Co-occurring Learning (CCL) paradigm to synthesize the counterfactual representations via coupling constant foreground and unrealized backgrounds in order to cut off their co-occurring relationship. Specifically, we design a new network structure called Counterfactual-CAM, which embeds the counterfactual representation perturbation mechanism into the vanilla CAM-based model. This mechanism is responsible for decoupling foreground as well as background and synthesizing the counterfactual representations. By training the detection model with these synthesized representations, we compel the model to focus on the constant foreground content while minimizing the influence of distracting co-occurring background. To our best knowledge, it is the first attempt in this direction. Extensive experiments on several benchmarks demonstrate that Counterfactual-CAM successfully mitigates the biased activation problem, achieving improved object localization accuracy.
翻译:本文聚焦于基于类激活映射(CAM)的弱监督目标定位方法中一个尚未充分探索的偏置激活问题。我们从因果视角分析了该问题的成因,并将其归因于共现背景混杂因素。基于这一见解,我们提出一种新颖的反事实共现学习(CCL)范式,通过耦合恒定前景与未实现背景来合成反事实表示,从而切断其共现关系。具体而言,我们设计了一种名为Counterfactual-CAM的新型网络结构,将反事实表示扰动机制嵌入到传统CAM模型中。该机制负责解耦前景与背景,并合成反事实表示。通过使用这些合成表示训练检测模型,我们迫使模型聚焦于恒定前景内容,同时最大程度减少干扰性共现背景的影响。据我们所知,这是该方向上的首次尝试。在多个基准数据集上的大量实验表明,Counterfactual-CAM成功缓解了偏置激活问题,实现了更优的目标定位精度。