Unlearnable example attacks are data poisoning attacks aiming to degrade the clean test accuracy of deep learning by adding imperceptible perturbations to the training samples, which can be formulated as a bi-level optimization problem. However, directly solving this optimization problem is intractable for deep neural networks. In this paper, we investigate unlearnable example attacks from a game-theoretic perspective, by formulating the attack as a nonzero sum Stackelberg game. First, the existence of game equilibria is proved under the normal setting and the adversarial training setting. It is shown that the game equilibrium gives the most powerful poison attack in that the victim has the lowest test accuracy among all networks within the same hypothesis space, when certain loss functions are used. Second, we propose a novel attack method, called the Game Unlearnable Example (GUE), which has three main gradients. (1) The poisons are obtained by directly solving the equilibrium of the Stackelberg game with a first-order algorithm. (2) We employ an autoencoder-like generative network model as the poison attacker. (3) A novel payoff function is introduced to evaluate the performance of the poison. Comprehensive experiments demonstrate that GUE can effectively poison the model in various scenarios. Furthermore, the GUE still works by using a relatively small percentage of the training data to train the generator, and the poison generator can generalize to unseen data well. Our implementation code can be found at https://github.com/hong-xian/gue.
翻译:不可学习示例攻击是一种数据投毒攻击,旨在通过向训练样本添加难以察觉的扰动来降低深度学习的干净测试准确率,该问题可被形式化为双层优化问题。然而,对于深度神经网络而言,直接求解该优化问题十分困难。本文从博弈论视角研究不可学习示例攻击,将攻击建模为非零和斯塔克尔伯格博弈。首先,我们证明在常规设置和对抗训练设置下博弈均衡的存在性。研究表明,当使用特定损失函数时,该博弈均衡能给出最具威力的投毒攻击——同一假设空间内所有网络中受害者的测试准确率最低。其次,我们提出一种名为博弈不可学习示例(GUE)的新型攻击方法,该方法包含三个主要梯度:(1)通过一阶算法直接求解斯塔克尔伯格博弈均衡获取投毒样本;(2)采用类自编码器生成网络模型作为投毒攻击者;(3)引入新型收益函数评估投毒性能。综合实验表明,GUE能在多种场景下有效毒化模型。此外,仅使用相对较小比例的训练数据训练生成器时GUE仍有效,且投毒生成器能良好泛化至未见数据。我们的实现代码见https://github.com/hong-xian/gue。