Backdoor attacks pose a new and emerging threat to AI security, where Deep Neural Networks (DNNs) are trained on datasets added to hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it produces anomalous results on samples containing the trigger pattern. Nevertheless, most existing backdoor attacks face two significant drawbacks: their trigger patterns are visible and easy to detect by human inspection, and their injection process leads to the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and the model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations by using spatial attention mechanism and U-type model. Our attack leverages spatial attention mechanism to extract data features and generate invisible trigger patterns that are correlated with clean data. Then it uses U-type model to plant these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNNs across three standard datasets and demonstrate that it achieves high attack success rate and robustness against backdoor defenses. Additionally, we also conduct extensive experiments on image similarity to highlight the stealthiness of our attack.
翻译:后门攻击对人工智能安全构成了一种新兴威胁,在这种攻击中,深度神经网络(DNN)会在添加了隐藏触发模式的数据集上进行训练。尽管被投毒后的模型在良性样本上表现正常,但在包含触发模式的样本上会产生异常结果。然而,现有的大多数后门攻击存在两个显著缺陷:其触发模式是可见的,容易被人工检测发现;并且其注入过程会导致自然样本特征和触发模式丢失,从而降低攻击成功率和模型准确率。本文提出了一种名为SATBA的新型后门攻击,它通过使用空间注意力机制和U型模型克服了这些限制。我们的攻击利用空间注意力机制提取数据特征,并生成与干净数据相关的隐形触发模式,随后使用U型模型将这些触发模式植入原始数据,而不会造成明显的特征损失。我们在三个标准数据集上对三个主流图像分类DNN进行了评估,结果表明该攻击实现了高攻击成功率,并对后门防御具有鲁棒性。此外,我们还进行了大量关于图像相似度的实验,以突出我们攻击的隐蔽性。