Deep neural networks (DNNs) are susceptible to backdoor attacks, where malicious functionality is embedded to allow attackers to trigger incorrect classifications. Old-school backdoor attacks use strong trigger features that can easily be learned by victim models. Despite robustness against input variation, the robustness however increases the likelihood of unintentional trigger activations. This leaves traces to existing defenses, which find approximate replacements for the original triggers that can activate the backdoor without being identical to the original trigger via, e.g., reverse engineering and sample overlay. In this paper, we propose and investigate a new characteristic of backdoor attacks, namely, backdoor exclusivity, which measures the ability of backdoor triggers to remain effective in the presence of input variation. Building upon the concept of backdoor exclusivity, we propose Backdoor Exclusivity LifTing (BELT), a novel technique which suppresses the association between the backdoor and fuzzy triggers to enhance backdoor exclusivity for defense evasion. Extensive evaluation on three popular backdoor benchmarks validate, our approach substantially enhances the stealthiness of four old-school backdoor attacks, which, after backdoor exclusivity lifting, is able to evade six state-of-the-art backdoor countermeasures, at almost no cost of the attack success rate and normal utility. For example, one of the earliest backdoor attacks BadNet, enhanced by BELT, evades most of the state-of-the-art defenses including ABS and MOTH which would otherwise recognize the backdoored model.
翻译:深度神经网络(DNNs)易受后门攻击,其中嵌入恶意功能使攻击者能够触发错误分类。老式后门攻击使用强触发器特征,这些特征极易被受害模型学习。尽管对输入变化具有稳健性,但这种稳健性反而增加了意外触发器激活的可能性。这为现有防御留下了可追踪痕迹,这些防御通过逆向工程和样本叠加等方式,为原始触发器找到近似替代品,从而无需与原始触发器完全一致即可激活后门。本文提出并研究后门攻击的新特性,即后门专有性,该特性衡量后门触发器在输入变化下保持有效性的能力。基于后门专有性概念,我们提出后门专有性提升(BELT)这一新技术,该技术通过抑制后门与模糊触发器之间的关联来增强后门专有性,从而实现防御规避。在三个主流后门基准上的广泛评估验证了,我们的方法显著提升了四种老式后门攻击的隐蔽性;经过后门专有性提升后,这些攻击能够规避六种最先进的后门防御措施,且几乎不牺牲攻击成功率和正常效用。例如,最早的后门攻击之一BadNet经BELT增强后,可规避包括ABS和MOTH在内的大多数最先进防御,而原始BadNet本会被这些防御识别为后门模型。