All current backdoor attacks on deep learning (DL) models fall under the category of a vertical class backdoor (VCB) -- class-dependent. In VCB attacks, any sample from a class activates the implanted backdoor when the secret trigger is present. Existing defense strategies overwhelmingly focus on countering VCB attacks, especially those that are source-class-agnostic. This narrow focus neglects the potential threat of other simpler yet general backdoor types, leading to false security implications. This study introduces a new, simple, and general type of backdoor attack coined as the horizontal class backdoor (HCB) that trivially breaches the class dependence characteristic of the VCB, bringing a fresh perspective to the community. HCB is now activated when the trigger is presented together with an innocuous feature, regardless of class. For example, the facial recognition model misclassifies a person who wears sunglasses with a smiling innocuous feature into the targeted person, such as an administrator, regardless of which person. The key is that these innocuous features are horizontally shared among classes but are only exhibited by partial samples per class. Extensive experiments on attacking performance across various tasks, including MNIST, facial recognition, traffic sign recognition, object detection, and medical diagnosis, confirm the high efficiency and effectiveness of the HCB. We rigorously evaluated the evasiveness of the HCB against a series of eleven representative countermeasures, including Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), NAD (ICLR 21'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), Beatrix (NDSS 23'), and MM-BD (Oakland 24'). None of these countermeasures prove robustness, even when employing a simplistic trigger, such as a small and static white-square patch.
翻译:当前针对深度学习模型的所有后门攻击均属于垂直类别后门类别——即具有类别依赖性。在VCB攻击中,当存在秘密触发器时,来自特定类别的任何样本都会激活植入的后门。现有防御策略绝大多数集中于对抗VCB攻击,特别是那些与源类别无关的攻击。这种狭隘的关注忽略了其他更简单但通用的后门类型可能带来的威胁,导致产生虚假的安全认知。本研究提出了一种新颖、简单且通用的后门攻击类型,称为水平类别后门,它轻易突破了VCB的类别依赖性特征,为该领域带来了全新视角。HCB的激活条件为:当触发器与无害特征同时出现时即被触发,与样本类别无关。例如,人脸识别模型会将佩戴太阳镜且带有微笑无害特征的人员误分类为目标人物(如管理员),而无论该人员原本属于哪一类别。关键在于这些无害特征在类别间水平共享,但每个类别中仅部分样本表现出这些特征。我们在多种任务上进行了广泛的攻击性能实验,包括MNIST、人脸识别、交通标志识别、目标检测和医疗诊断,结果证实了HCB的高效性与有效性。我们系统评估了HCB对十一种代表性防御措施的规避能力,包括Fine-Pruning (RAID 18')、STRIP (ACSAC 19')、Neural Cleanse (Oakland 19')、ABS (CCS 19')、Februus (ACSAC 20')、NAD (ICLR 21')、MNTD (Oakland 21')、SCAn (USENIX SEC 21')、MOTH (Oakland 22')、Beatrix (NDSS 23')和MM-BD (Oakland 24')。实验表明,即使采用简单的触发器(如小型静态白色方形补丁),这些防御措施均未展现出鲁棒性。