Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.
翻译:深度神经网络已被证实易受精心设计的对抗样本攻击,这些样本可通过精心构造的$\mathcal{L}_p$范数受限攻击或无限制攻击生成。然而,现有方法大多假设攻击者可随意修改任意特征,却忽略了数据固有的因果生成过程,这种假设既不合理也不切实际。例如,在银行系统中,对收入字段的修改会不可避免地影响债务收入比等特征。通过审视被忽视的因果生成过程,我们首先借助因果视角定位深度神经网络的脆弱性根源,从理论上回答"攻击点在哪里"的问题。其次,考虑攻击干预对样本当前状态的影响以生成更真实的对抗样本,提出CADE框架——该框架可生成反事实对抗样本来回答"如何攻击"的问题。实验结果表明,CADE在白盒攻击、基于迁移的攻击及随机干预攻击等多种攻击场景下均展现出卓越的竞争力。