Adversarial training is often formulated as a min-max problem, however, concentrating only on the worst adversarial examples causes alternating repetitive confusion of the model, i.e., previously defended or correctly classified samples are not defensible or accurately classifiable in subsequent adversarial training. We characterize such non-ignorable samples as "hiders", which reveal the hidden high-risk regions within the secure area obtained through adversarial training and prevent the model from finding the real worst cases. We demand the model to prevent hiders when defending against adversarial examples for improving accuracy and robustness simultaneously. By rethinking and redefining the min-max optimization problem for adversarial training, we propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT). HFAT introduces the iterative evolution optimization strategy to simplify the optimization problem and employs an auxiliary model to reveal hiders, effectively combining the optimization directions of standard adversarial training and prevention hiders. Furthermore, we introduce an adaptive weighting mechanism that facilitates the model in adaptively adjusting its focus between adversarial examples and hiders during different training periods. We demonstrate the effectiveness of our method based on extensive experiments, and ensure that HFAT can provide higher robustness and accuracy.
翻译:摘要:对抗训练通常被表述为一个最小-最大优化问题,然而,仅关注最坏情况下的对抗样本会导致模型出现交替性的重复混淆,即先前已被防御或正确分类的样本在后续对抗训练中变得不再可防御或可准确分类。我们将此类不可忽视的样本定义为“隐藏者”,它们揭示了通过对抗训练获得的安全区域内仍存在的高风险区域,并阻碍模型找到真正的极端情况。我们要求模型在防御对抗样本的同时预防隐藏者,以同时提升准确性与鲁棒性。通过重新思考和重新定义对抗训练的最小-最大优化问题,我们提出了一种广义对抗训练算法,称为聚焦隐藏者的对抗训练(HFAT)。HFAT引入迭代进化优化策略以简化优化问题,并利用辅助模型揭示隐藏者,有效结合了标准对抗训练与预防隐藏者的优化方向。此外,我们引入自适应加权机制,使模型能够在不同训练阶段自适应调整对对抗样本与隐藏者的关注程度。基于大量实验,我们验证了该方法的有效性,并确保HFAT能提供更高的鲁棒性与准确性。