Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks. To prevent potential threats, various adversarial training-based defense methods have grown rapidly and become a de facto standard approach for robustness. Despite recent competitive achievements, we observe that adversarial vulnerability varies across targets and certain vulnerabilities remain prevalent. Intriguingly, such peculiar phenomenon cannot be relieved even with deeper architectures and advanced defense methods. To address this issue, in this paper, we introduce a causal approach called Adversarial Double Machine Learning (ADML), which allows us to quantify the degree of adversarial vulnerability for network predictions and capture the effect of treatments on outcome of interests. ADML can directly estimate causal parameter of adversarial perturbations per se and mitigate negative effects that can potentially damage robustness, bridging a causal perspective into the adversarial vulnerability. Through extensive experiments on various CNN and Transformer architectures, we corroborate that ADML improves adversarial robustness with large margins and relieve the empirical observation.
翻译:从视觉输入中精心制作的扰动生成的对抗样本能够轻易影响深度神经网络的决策过程。为防止潜在威胁,各类基于对抗训练的防御方法迅速发展,已成为提升鲁棒性的事实标准方法。尽管近期取得了竞争性成果,但我们观察到对抗脆弱性因目标而异,且某些脆弱性仍然普遍存在。有趣的是,即使采用更深层架构和先进防御方法,这种奇特现象也无法缓解。为解决该问题,本文提出一种称为对抗双重机器学习(ADML)的因果方法,该方法能够量化网络预测的对抗脆弱性程度,并捕捉处理对感兴趣结果的影响。ADML可直接估计对抗扰动本身的因果参数,并缓解可能损害鲁棒性的负面效应,从而将因果视角引入对抗脆弱性研究。通过在多种CNN和Transformer架构上的广泛实验,我们证实ADML能大幅提升对抗鲁棒性并缓解上述经验性观察现象。