Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
翻译:对抗样本源于对视觉输入精心设计的扰动,可轻易干扰深度神经网络的决策过程。为防范潜在威胁,各类基于对抗训练的防御方法迅速发展,已成为提升鲁棒性的事实标准方法。尽管近期取得了竞争性成果,但我们观察到对抗脆弱性在不同目标间存在差异,且某些脆弱性仍普遍存在。令人关注的是,即便采用更深层的架构和先进的防御方法,这种特殊现象也无法缓解。针对这一问题,本文提出一种名为对抗双重机器学习(ADML)的因果方法,该方法能量化网络预测的对抗脆弱性程度,并捕捉处理对感兴趣结果的影响。ADML可直接估计对抗扰动本身的因果参数,缓解可能损害鲁棒性的负面影响,从而将因果视角引入对抗脆弱性研究。通过在多种CNN和Transformer架构上的大量实验,我们证实ADML可大幅提升对抗鲁棒性,并缓解上述实证观察到的现象。