This work explores an emerging security threat against deep neural networks (DNNs) based image classification, i.e., backdoor attack. In this scenario, the attacker aims to inject a backdoor into the model by manipulating training data, such that the backdoor could be activated by a particular trigger and bootstraps the model to make a target prediction at inference. Currently, most existing data poisoning-based attacks struggle to achieve success at low poisoning ratios, increasing the risk of being defended by defense methods. In this paper, we propose a novel frequency-based backdoor attack via Wavelet Packet Decomposition (WPD), WPD decomposes the original image signal to a spectrogram that contains frequency information with different semantic meanings. We leverage WPD to statistically analyze the frequency distribution of the dataset to infer the key frequency regions the DNNs would focus on, and the trigger information is only injected into the key frequency regions. Our method mainly includes three parts: 1) the selection of the poisoning frequency regions in spectrogram; 2) trigger generation; 3) the generation of the poisoned dataset. Our method is stealthy and precise, evidenced by the 98.12% Attack Success Rate (ASR) on CIFAR-10 with the extremely low poisoning ratio 0.004% (i.e., only 2 poisoned samples among 50,000 training samples) and can bypass most existing defense methods. Besides, we also provide visualization analyses to explain why our method works.
翻译:本研究探讨了针对基于深度神经网络(DNN)的图像分类任务的一种新兴安全威胁——后门攻击。在此场景中,攻击者旨在通过操纵训练数据向模型中注入后门,使得该后门能被特定触发器激活,并在推理时引导模型做出目标预测。目前,大多数现有的基于数据投毒的攻击方法难以在低投毒比例下取得成功,从而增加了被防御方法检测的风险。本文提出了一种新颖的基于频率域的后门攻击方法,该方法利用小波包分解(WPD)实现。WPD将原始图像信号分解为包含不同语义信息的频谱图。我们利用WPD对数据集的频率分布进行统计分析,以推断DNN可能关注的关键频率区域,并将触发器信息仅注入这些关键频率区域。我们的方法主要包括三个部分:1)频谱图中投毒频率区域的选择;2)触发器生成;3)投毒数据集的生成。我们的方法具有隐蔽性和精确性,在CIFAR-10数据集上以极低的投毒比例0.004%(即50,000个训练样本中仅含2个投毒样本)实现了98.12%的攻击成功率(ASR),并能绕过大多数现有防御方法。此外,我们还提供了可视化分析以解释本方法的工作原理。