Horizontal Class Backdoor to Deep Learning

All existing backdoor attacks to deep learning (DL) models belong to the vertical class backdoor (VCB). That is, any sample from a class will activate the implanted backdoor in the presence of the secret trigger, regardless of source-class-agnostic or source-class-specific backdoor. Current trends of existing defenses are overwhelmingly devised for VCB attacks especially the source-class-agnostic backdoor, which essentially neglects other potential simple but general backdoor types, thus giving false security implications. It is thus urgent to discover unknown backdoor types. This work reveals a new, simple, and general horizontal class backdoor (HCB) attack. We show that the backdoor can be naturally bounded with innocuous natural features that are common and pervasive in the real world. Note that an innocuous feature (e.g., expression) is irrelevant to the main task of the model (e.g., recognizing a person from one to another). The innocuous feature spans across classes horizontally but is exhibited by partial samples per class -- satisfying the horizontal class (HC) property. Only when the trigger is concurrently presented with the HC innocuous feature, can the backdoor be effectively activated. Extensive experiments on attacking performance in terms of high attack success rates with tasks of 1) MNIST, 2) facial recognition, 3) traffic sign recognition, and 4) object detection demonstrate that the HCB is highly efficient and effective. We extensively evaluate the HCB evasiveness against a (chronologically) series of 9 influential countermeasures of Fine-Pruning (RAID 18'), STRIP (ACSAC 19'), Neural Cleanse (Oakland 19'), ABS (CCS 19'), Februus (ACSAC 20'), MNTD (Oakland 21'), SCAn (USENIX SEC 21'), MOTH (Oakland 22'), and Beatrix (NDSS 23'), where none of them can succeed even when a simplest trigger is used.

翻译：所有现存的针对深度学习（DL）模型的后门攻击均属于垂直类后门（VCB）。即，无论源类别无关还是源类别相关后门，某个类别中的任意样本在存在秘密触发器时都会激活植入的后门。当前防御措施的主流趋势几乎全部针对VCB攻击设计，尤其是源类别无关后门，这在本质上忽略了其他潜在简单但通用的后门类型，从而带来虚假的安全假象。因此，发现未知后门类型迫在眉睫。本工作揭示了一种新型、简单且通用的水平类后门（HCB）攻击。我们证明，此类后门可自然地与现实中常见且普遍的良性自然特征相关联。需注意，良性特征（如表情）与模型的主任务（如识别不同的人）无关。该良性特征横向跨越多个类别，但仅由每个类别中的部分样本所呈现，满足水平类（HC）属性。只有当触发器与HC良性特征同时出现时，后门才能被有效激活。我们在四项任务（1）MNIST、（2）人脸识别、（3）交通标志识别和（4）目标检测中，以高攻击成功率为指标进行了大量实验，证明HCB具有极高的效率和有效性。我们系统评估了HCB对一系列（按时间顺序）9种具有影响力的防御措施的规避能力，包括Fine-Pruning (RAID 18')、STRIP (ACSAC 19')、Neural Cleanse (Oakland 19')、ABS (CCS 19')、Februus (ACSAC 20')、MNTD (Oakland 21')、SCAn (USENIX SEC 21')、MOTH (Oakland 22')和Beatrix (NDSS 23')，结果表明即便使用最简单的触发器，这些防御措施均无法成功检测。