Backdoor attacks aim to surreptitiously insert malicious triggers into DNN models, granting unauthorized control during testing scenarios. Existing methods lack robustness against defense strategies and predominantly focus on enhancing trigger stealthiness while randomly selecting poisoned samples. Our research highlights the overlooked drawbacks of random sampling, which make that attack detectable and defensible. The core idea of this paper is to strategically poison samples near the model's decision boundary and increase defense difficulty. We introduce a straightforward yet highly effective sampling methodology that leverages confidence scores. Specifically, it selects samples with lower confidence scores, significantly increasing the challenge for defenders in identifying and countering these attacks. Importantly, our method operates independently of existing trigger designs, providing versatility and compatibility with various backdoor attack techniques. We substantiate the effectiveness of our approach through a comprehensive set of empirical experiments, demonstrating its potential to significantly enhance resilience against backdoor attacks in DNNs.
翻译:后门攻击旨在隐秘地将恶意触发器插入深度神经网络模型中,从而在测试阶段获得未授权控制。现有方法对防御策略缺乏鲁棒性,且主要集中于提升触发器的隐蔽性,却随机选择中毒样本。本研究揭示了随机采样被忽视的缺陷——该策略使得攻击易被检测和防御。本文核心思想是策略性地选择模型决策边界附近的样本进行中毒,以增加防御难度。我们提出一种简洁高效的采样方法,利用置信度分数进行选择。具体而言,该方法优先选取置信度较低的样本,显著提升了防御者识别与对抗此类攻击的挑战性。重要的是,本方法独立于现有触发器设计,具备与各类后门攻击技术的兼容性与通用性。通过全面的实证实验,我们验证了该方法的有效性,证明其能显著增强深度神经网络对后门攻击的抵御能力。