Inverse adversarial training leverages high-confidence predictions to stabilize robust learning, yet we uncover a critical paradox: high confidence often stems from overfitting to non-causal background correlations rather than intrinsic object semantics. Our investigation reveals that visual context functions as a dual-natured signal, serving as either a necessary supportive prior or a spurious confounder. This insight renders existing blind suppression strategies flawed, as they inevitably lead to severe Feature Loss. To resolve this, we propose High-Confidence Causally Aligned Training (HICAT), a unified framework that establishes a Semantic Equilibrium. Operating on a ``Measure-Debias-Align'' pipeline, HICAT integrates a Learnable Background-Bias Estimator (LBBE) to adaptively diagnose context utility. Guided by this diagnosis, an Adaptive Debiasing mechanism performs surgical logit rectification, complemented by a geometrically grounded Foreground Logit Orthogonal Enhancement (FLOE) loss to enforce rigorous feature disentanglement. Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet-1K demonstrate that HICAT consistently improves over matched baselines across diverse architectures (CNNs and ViTs) while significantly reducing the robust generalization gap.
翻译:逆向对抗训练利用高置信度预测来稳定鲁棒学习,但我们发现了一个关键悖论:高置信度往往源于对非因果背景相关性的过拟合,而非内在的物体语义。我们的研究表明,视觉上下文具有双重信号属性,既可作为必要的支持性先验,也可作为虚假混杂因子。这一洞见揭示了现有盲目抑制策略的缺陷——它们不可避免地导致严重的特征损失。为解决此问题,我们提出高置信度因果对齐训练(HICAT),这是一种建立语义均衡的统一框架。基于"度量-去偏-对齐"流程,HICAT集成了可学习背景偏差估计器(LBBE),能够自适应诊断上下文效用。在该诊断指导下,自适应去偏机制执行精确的逻辑修正,并辅以几何驱动的正向前景逻辑正交增强(FLOE)损失,以强制实现严格的特征解耦。在CIFAR-10、CIFAR-100和ImageNet-1K上进行的大量实验表明,HICAT在不同架构(CNN和ViT)上均能持续超越对应基线,同时显著缩小鲁棒泛化差距。