Deep learning has led to tremendous success in computer vision, largely due to Convolutional Neural Networks (CNNs). However, CNNs have been shown to be vulnerable to crafted adversarial perturbations. This vulnerability of adversarial examples has has motivated research into improving model robustness through adversarial detection and defense methods. In this paper, we address the adversarial robustness of CNNs through causal reasoning. We propose CausAdv: a causal framework for detecting adversarial examples based on counterfactual reasoning. CausAdv learns both causal and non-causal features of every input, and quantifies the counterfactual information (CI) of every filter of the last convolutional layer. We then perform a statistical analysis of the filters' CI across clean and adversarial samples, to demonstrate that adversarial examples exhibit different CI distributions compared to clean samples. Our results show that causal reasoning enhances the process of adversarial detection without the need to train a separate detector. Moreover, we illustrate the efficiency of causal explanations as a helpful detection tool by visualizing the extracted causal features.
翻译:深度学习在计算机视觉领域取得了巨大成功,这主要归功于卷积神经网络(CNNs)。然而,研究表明,CNNs容易受到精心设计的对抗性扰动的攻击。对抗样本的这种脆弱性促使了通过对抗检测与防御方法来提升模型鲁棒性的研究。本文通过因果推理来解决CNNs的对抗鲁棒性问题。我们提出了CausAdv:一种基于反事实推理的对抗样本检测因果框架。CausAdv学习每个输入的因果特征与非因果特征,并对最后一个卷积层中每个滤波器的反事实信息(CI)进行量化。随后,我们对干净样本与对抗样本之间滤波器的CI进行统计分析,以证明对抗样本与干净样本相比呈现出不同的CI分布。我们的结果表明,因果推理能够增强对抗检测的过程,而无需训练单独的检测器。此外,我们通过可视化提取的因果特征,阐明了因果解释作为一种有效检测工具的效率。