Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer \emph{where to attack}. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate \textbf{C}ounterfactual \textbf{AD}versarial \textbf{E}xamples to answer \emph{how to attack}. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.
翻译:深度神经网络(DNNs)已被证明易受精心构造的对抗样本攻击,这些样本通过精心设计的$\mathcal{L}_p$范数受限或不受限攻击生成。然而,多数方法假设攻击者可任意修改特征,却忽视了数据背后的因果生成过程,这既不合理也不切实际。例如,在银行系统中,收入的变化会不可避免地影响债务收入比等特征。通过考虑长期被忽视的因果生成过程,首先,我们从因果视角定位DNNs脆弱性的根源,并给出理论结果回答“从何处攻击”。其次,考虑攻击干预对样本当前状态的影响以生成更真实的对抗样本,我们提出CADE框架——一种能生成反事实对抗样本的方法,用以回答“如何攻击”。实验结果表明CADE的有效性:在白盒攻击、迁移攻击及随机干预攻击等多样化场景中均展现出竞争性性能。