Data poisoning attacks, in which a malicious adversary aims to influence a model by injecting "poisoned" data into the training process, have attracted significant recent attention. In this work, we take a closer look at existing poisoning attacks and connect them with old and new algorithms for solving sequential Stackelberg games. By choosing an appropriate loss function for the attacker and optimizing with algorithms that exploit second-order information, we design poisoning attacks that are effective on neural networks. We present efficient implementations that exploit modern auto-differentiation packages and allow simultaneous and coordinated generation of tens of thousands of poisoned points, in contrast to existing methods that generate poisoned points one by one. We further perform extensive experiments that empirically explore the effect of data poisoning attacks on deep neural networks.
翻译:数据投毒攻击(即恶意攻击者通过向训练过程中注入“有毒”数据来影响模型的行为)近期引起了广泛关注。本文重新审视了现有投毒攻击方法,并将其与序贯Stackelberg博弈的经典及新兴求解算法建立了联系。通过为攻击者选择恰当的损失函数,并采用利用二阶信息的优化算法,我们设计出对神经网络有效的投毒攻击方法。我们提出了利用现代自动微分包的高效实现方案,能够同时协调生成数以万计的有毒数据点——这与现有逐点生成有毒数据的方法形成鲜明对比。此外,我们开展了大量实验,从实证角度深入探究了数据投毒攻击对深度神经网络的影响。