Deep neural networks (DNNs) are vulnerable to backdoor attacks, where an attacker manipulates a small portion of the training data to implant hidden backdoors into the model. The compromised model behaves normally on clean samples but misclassifies backdoored samples into the attacker-specified target class, posing a significant threat to real-world DNN applications. Currently, several empirical defense methods have been proposed to mitigate backdoor attacks, but they are often bypassed by more advanced backdoor techniques. In contrast, certified defenses based on randomized smoothing have shown promise by adding random noise to training and testing samples to counteract backdoor attacks. In this paper, we reveal that existing randomized smoothing defenses implicitly assume that all samples are equidistant from the decision boundary. However, it may not hold in practice, leading to suboptimal certification performance. To address this issue, we propose a sample-specific certified backdoor defense method, termed Cert-SSB. Cert-SSB first employs stochastic gradient ascent to optimize the noise magnitude for each sample, ensuring a sample-specific noise level that is then applied to multiple poisoned training sets to retrain several smoothed models. After that, Cert-SSB aggregates the predictions of multiple smoothed models to generate the final robust prediction. In particular, in this case, existing certification methods become inapplicable since the optimized noise varies across different samples. To conquer this challenge, we introduce a storage-update-based certification method, which dynamically adjusts each sample's certification region to improve certification performance. We conduct extensive experiments on multiple benchmark datasets, demonstrating the effectiveness of our proposed method. Our code is available at https://github.com/NcepuQiaoTing/Cert-SSB.
翻译:深度神经网络(DNN)易受后门攻击,攻击者通过操纵一小部分训练数据,在模型中植入隐藏后门。受感染的模型在干净样本上表现正常,但会将后门样本错误分类为攻击者指定的目标类别,这对现实世界中的DNN应用构成重大威胁。目前,已有多种经验性防御方法被提出以缓解后门攻击,但它们常被更先进的后门技术所绕过。相比之下,基于随机平滑的认证防御通过向训练和测试样本添加随机噪声来对抗后门攻击,已显示出潜力。本文揭示,现有的随机平滑防御方法隐含地假设所有样本与决策边界的距离相等。然而,这一假设在实践中可能不成立,导致认证性能欠佳。为解决此问题,我们提出一种样本特定的认证后门防御方法,称为Cert-SSB。Cert-SSB首先采用随机梯度上升法优化每个样本的噪声幅度,确保获得样本特定的噪声水平,随后将其应用于多个中毒训练集以重新训练多个平滑模型。之后,Cert-SSB聚合多个平滑模型的预测以生成最终的鲁棒预测。特别地,在此情况下,由于优化后的噪声在不同样本间存在差异,现有的认证方法变得不再适用。为克服这一挑战,我们引入一种基于存储-更新的认证方法,动态调整每个样本的认证区域以提升认证性能。我们在多个基准数据集上进行了广泛实验,证明了所提方法的有效性。我们的代码可在https://github.com/NcepuQiaoTing/Cert-SSB获取。