While numerous defense methods have been proposed to prohibit potential poisoning attacks from untrusted data sources, most research works only defend against specific attacks, which leaves many avenues for an adversary to exploit. In this work, we propose an efficient and robust training approach to defend against data poisoning attacks based on influence functions, named Healthy Influential-Noise based Training. Using influence functions, we craft healthy noise that helps to harden the classification model against poisoning attacks without significantly affecting the generalization ability on test data. In addition, our method can perform effectively when only a subset of the training data is modified, instead of the current method of adding noise to all examples that has been used in several previous works. We conduct comprehensive evaluations over two image datasets with state-of-the-art poisoning attacks under different realistic attack scenarios. Our empirical results show that HINT can efficiently protect deep learning models against the effect of both untargeted and targeted poisoning attacks.
翻译:摘要:尽管已有众多防御方法被提出以阻止来自不可信数据源的潜在投毒攻击,但大多数研究工作仅能防御特定类型的攻击,这为攻击者留下了大量可乘之机。本文提出一种基于影响函数的高效鲁棒训练方法——健康影响性噪声训练(Healthy Influential-Noise based Training, HINT)。该方法利用影响函数生成健康噪声,有助于强化分类模型对投毒攻击的防御能力,同时避免显著影响测试数据的泛化能力。此外,当前已有研究中普遍采用的噪声添加方式需作用于全部训练样本,而本方法仅需修改训练数据子集即可高效运行。我们在两个图像数据集上开展全面评估,采用最先进的投毒攻击方法,涵盖多种现实攻击场景。实验结果表明,HINT能够有效保护深度学习模型免受无目标和有目标投毒攻击的影响。