This work explores an emerging security threat against deep neural networks (DNNs) based image classification, i.e., backdoor attack. In this scenario, the attacker aims to inject a backdoor into the model by manipulating training data, such that the backdoor could be activated by a particular trigger and bootstraps the model to make a target prediction at inference. Currently, most existing data poisoning-based attacks struggle to achieve success at low poisoning ratios, increasing the risk of being defended by defense methods. In this paper, we propose a novel frequency-based backdoor attack via Wavelet Packet Decomposition (WPD), WPD decomposes the original image signal to a spectrogram that contains frequency information with different semantic meanings. We leverage WPD to statistically analyze the frequency distribution of the dataset to infer the key frequency regions the DNNs would focus on, and the trigger information is only injected into the key frequency regions. Our method mainly includes three parts: 1) the selection of the poisoning frequency regions in spectrogram; 2) trigger generation; 3) the generation of the poisoned dataset. Our method is stealthy and precise, evidenced by the 98.12% Attack Success Rate (ASR) on CIFAR-10 with the extremely low poisoning ratio 0.004% (i.e., only 2 poisoned samples among 50,000 training samples) and can bypass most existing defense methods. Besides, we also provide visualization analyses to explain why our method works.
翻译:本文研究了一种针对深度神经网络图像分类的新兴安全威胁——后门攻击。在此场景中,攻击者旨在通过操纵训练数据向模型中注入后门,使得该后门能被特定触发器激活,从而在推理阶段引导模型做出目标预测。当前,大多数基于数据投毒的攻击方法难以在低投毒率下成功实现攻击,这增加了被防御方法拦截的风险。本文提出了一种基于小波包分解(WPD)的全新频域后门攻击方法。WPD将原始图像信号分解为包含不同语义含义频率信息的频谱图。我们利用WPD对数据集的频率分布进行统计分析,推断深度神经网络在训练中关注的关键频域区域,并将触发器信息仅注入这些关键频域区域。我们的方法主要包括三部分:1)频谱图中投毒频域区域的选择;2)触发器生成;3)投毒数据集的生成。该方法具有隐蔽性和精确性,在CIFAR-10数据集上以0.004%的极低投毒率(即5万张训练样本中仅2张被投毒)实现了98.12%的攻击成功率(ASR),并能绕过大多数现有防御方法。此外,我们还通过可视化分析解释了该方法有效性的原理。