All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB) -- class-dependent. In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present. Existing defense strategies overwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. This study introduces a new, simple, and general type of backdoor attack coined as the horizontal class backdoor (HCB) that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the community. HCB is now activated when the trigger is presented together with an innocuous feature, regardless of class. For example, the facial recognition model misclassifies a person who wears sunglasses with a smiling innocuous feature into the targeted person, such as an administrator, regardless of which person. The key is that these innocuous features are horizontally shared among classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.
翻译:当前所有针对深度学习模型的后门攻击均属于垂直类后门——依赖类别。在垂直类后门攻击中,当秘密触发器存在时,任何属于该类的样本都会激活植入的后门。现有防御策略主要集中于防御垂直类后门攻击,尤其是那些与源类别无关的攻击。这种狭隘的关注忽略了其他更简单但更通用的后门类型的潜在威胁,导致虚假的安全结论。本研究提出一种新颖、简单且通用的后门攻击类型,命名为水平类后门,该攻击轻易打破了垂直类后门依赖类别的特性,为该领域带来了全新视角。当触发器与无害特征同时出现时,水平类后门即被激活,且不受类别限制。例如,人脸识别模型会将佩戴太阳镜且带有微笑这一无害特征的人误分类为目标人员(如管理员),无论该人属于哪一类别。关键在于,这些无害特征在不同类别间水平共享,但仅在每类的部分样本中体现。在多项任务(包括MNIST、人脸识别、交通标志识别、目标检测和医学诊断)上的大量攻击性能实验证实了水平类后门的高效性和有效性。我们严格评估了水平类后门对十一种代表性防御措施的规避能力,包括Fine-Pruning (RAID 18')、STRIP (ACSAC 19')、Neural Cleanse (Oakland 19')、ABS (CCS 19')、Februus (ACSAC 20')、NAD (ICLR 21')、MNTD (Oakland 21')、SCAn (USENIX SEC 21')、MOTH (Oakland 22')、Beatrix (NDSS 23')和MM-BD (Oakland 24')。即使使用简单的触发器(如微小且静止的白色方块图案),这些防御措施均未能证明其鲁棒性。