Backdoor attacks typically place a specific trigger on certain training data, such that the model makes prediction errors on inputs with that trigger during inference. Despite the core role of the trigger, existing studies have commonly believed a perfect match between training-inference triggers is optimal. In this paper, for the first time, we systematically explore the training-inference trigger relation, particularly focusing on their mismatch, based on a Training-Inference Trigger Intensity Manipulation (TITIM) workflow. TITIM specifically investigates the training-inference trigger intensity, such as the size or the opacity of a trigger, and reveals new insights into trigger generalization and overfitting. These new insights challenge the above common belief by demonstrating that the training-inference trigger mismatch can facilitate attacks in two practical scenarios, posing more significant security threats than previously thought. First, when the inference trigger is fixed, using training triggers with mixed intensities leads to stronger attacks than using any single intensity. For example, on CIFAR-10 with ResNet-18, mixing training triggers with 1.0 and 0.1 opacities improves the worst-case attack success rate (ASR) (over different testing opacities) of the best single-opacity attack from 10.61\% to 92.77\%. Second, intentionally using certain mismatched training-inference triggers can improve the attack stealthiness, i.e., better bypassing defenses. For example, compared to the training/inference intensity of 1.0/1.0, using 1.0/0.7 decreases the area under the curve (AUC) of the Scale-Up defense from 0.96 to 0.62, while maintaining a high attack ASR (99.65\% vs. 91.62\%). The above new insights are validated to be generalizable across different backdoor attacks, models, datasets, tasks, and (digital/physical) domains.
翻译:后门攻击通常会在特定训练数据上植入特定触发器,使得模型在推断阶段对包含该触发器的输入产生预测错误。尽管触发器处于核心地位,现有研究普遍认为训练与推断阶段触发器的完美匹配是最优的。本文首次基于训练-推断触发器强度调控工作流,系统性地探讨了训练与推断触发器之间的关系,特别聚焦于二者的不匹配情形。该工作流专门研究训练与推断阶段的触发器强度(如触发器的尺寸或透明度),并揭示了关于触发器泛化与过拟合的新见解。这些新发现挑战了上述普遍认知,证明训练-推断触发器的不匹配能在两种实际场景中增强攻击效果,构成比以往认知更严重的安全威胁。首先,当推断触发器固定时,使用混合强度的训练触发器比使用单一强度能实现更强的攻击效果。例如在CIFAR-10数据集与ResNet-18模型上,混合使用透明度为1.0和0.1的训练触发器,可将最佳单透明度攻击的最差情况攻击成功率(在不同测试透明度下)从10.61%提升至92.77%。其次,有意识地使用某些不匹配的训练-推断触发器可提升攻击隐蔽性,即更好地绕过防御机制。例如相较于训练/推断强度为1.0/1.0的配置,采用1.0/0.7的配置可将Scale-Up防御的曲线下面积从0.96降至0.62,同时保持较高的攻击成功率(99.65%对比91.62%)。上述新发现被验证可推广至不同的后门攻击方法、模型架构、数据集、任务类型以及(数字/物理)领域。