As a new realm of AI security, backdoor attack has drew growing attention research in recent years. It is well known that backdoor can be injected in a DNN model through the process of model training with poisoned dataset which is consist of poisoned sample. The injected model output correct prediction on benign samples yet behave abnormally on poisoned samples included trigger pattern. Most existing trigger of poisoned sample are visible and can be easily found by human visual inspection, and the trigger injection process will cause the feature loss of natural sample and trigger. To solve the above problems and inspire by spatial attention mechanism, we introduce a novel backdoor attack named SATBA, which is invisible and can minimize the loss of trigger to improve attack success rate and model accuracy. It extracts data features and generate trigger pattern related to clean data through spatial attention, poisons clean image by using a U-type models to plant a trigger into the original data. We demonstrate the effectiveness of our attack against three popular image classification DNNs on three standard datasets. Besides, we conduct extensive experiments about image similarity to show that our proposed attack can provide practical stealthiness which is critical to resist to backdoor defense.
翻译:作为人工智能安全的一个新领域,后门攻击在近年来越来越受到研究关注。众所周知,后门可以通过使用包含中毒样本的污染数据集进行模型训练的过程注入到深度神经网络(DNN)模型中。注入后的模型对良性样本输出正确的预测,但对包含触发模式的中毒样本表现出异常行为。现有的大多数中毒样本触发模式是可见的,容易通过人工视觉检查发现,并且触发注入过程会导致自然样本和触发特征丢失。为了解决上述问题,并受空间注意力机制的启发,我们提出了一种名为SATBA的新型后门攻击,该攻击具有隐形性,并能最小化触发损失,从而提高攻击成功率和模型准确率。该方法通过空间注意力提取数据特征并生成与干净数据相关的触发模式,利用U型模型将触发种植到原始数据中,从而对干净图像进行污染。我们在三个标准数据集上,针对三种流行的图像分类DNN模型证明了我们攻击的有效性。此外,我们进行了大量关于图像相似性的实验,表明我们提出的攻击能够提供实用的隐蔽性,这对于抵抗后门防御至关重要。