In recent years, with the successful application of DNN in fields such as NLP and CV, its security has also received widespread attention. (Author) proposed the method of backdoor attack in Badnet. Switch implanted backdoor into the model by poisoning the training samples. The model with backdoor did not exhibit any abnormalities on the normal validation sample set, but in the input with trigger, they were mistakenly classified as the attacker's designated category or randomly classified as a different category from the ground truth, This attack method seriously threatens the normal application of DNN in real life, such as autonomous driving, object detection, etc.This article proposes a new method to combat backdoor attacks. We refer to the features in the area covered by the trigger as trigger features, and the remaining areas as normal features. By introducing prerequisite calculation conditions during the training process, these conditions have little impact on normal features and trigger features, and can complete the training of a standard backdoor model. The model trained under these prerequisite calculation conditions can, In the verification set D'val with the same premise calculation conditions, the performance is consistent with that of the ordinary backdoor model. However, in the verification set Dval without the premise calculation conditions, the verification accuracy decreases very little (7%~12%), while the attack success rate (ASR) decreases from 90% to about 8%.Author call this method Prerequisite Transformation(PT).
翻译:近年来,随着深度神经网络在自然语言处理和计算机视觉等领域的成功应用,其安全性也受到广泛关注。作者在Badnet中提出了后门攻击方法,通过污染训练样本将后门植入模型。带有后门的模型在正常验证样本集上未表现出任何异常,但在含有触发器的输入中会被错误分类为攻击者指定类别或随机分类为与真实标签不同的类别。这种攻击方法严重威胁深度神经网络在现实生活中的正常应用,如自动驾驶、目标检测等。本文提出了一种新的对抗后门攻击方法。我们将触发器覆盖区域的特征称为触发特征,其余区域的特征称为正常特征。通过在训练过程中引入先决计算条件,这些条件对正常特征和触发特征影响较小,并能完成标准后门模型的训练。在这些先决计算条件下训练的模型,在具有相同先决计算条件的验证集D'val上,性能与普通后门模型一致。然而,在不含先决计算条件的验证集Dval上,验证准确率下降幅度很小(7%~12%),而攻击成功率(ASR)从90%降至约8%。作者将此方法称为先决条件变换(Prerequisite Transformation, PT)。