CBPF: Filtering Poisoned Data Based on Composite Backdoor Attack

Backdoor attacks involve the injection of a limited quantity of poisoned examples containing triggers into the training dataset. During the inference stage, backdoor attacks can uphold a high level of accuracy for normal examples, yet when presented with trigger-containing instances, the model may erroneously predict them as the targeted class designated by the attacker. This paper explores strategies for mitigating the risks associated with backdoor attacks by examining the filtration of poisoned samples.We primarily leverage two key characteristics of backdoor attacks: the ability for multiple backdoors to exist simultaneously within a single model, and the discovery through Composite Backdoor Attack (CBA) that altering two triggers in a sample to new target labels does not compromise the original functionality of the triggers, yet enables the prediction of the data as a new target class when both triggers are present simultaneously.Therefore, a novel three-stage poisoning data filtering approach, known as Composite Backdoor Poison Filtering (CBPF), is proposed as an effective solution. Firstly, utilizing the identified distinctions in output between poisoned and clean samples, a subset of data is partitioned to include both poisoned and clean instances. Subsequently, benign triggers are incorporated and labels are adjusted to create new target and benign target classes, thereby prompting the poisoned and clean data to be classified as distinct entities during the inference stage. The experimental results indicate that CBPF is successful in filtering out malicious data produced by six advanced attacks on CIFAR10 and ImageNet-12. On average, CBPF attains a notable filtering success rate of 99.91% for the six attacks on CIFAR10. Additionally, the model trained on the uncontaminated samples exhibits sustained high accuracy levels.

翻译：后门攻击通过向训练数据集中注入少量包含触发器的投毒样本来实施。在推理阶段，后门攻击对正常样本能保持较高的准确率，但当遇到包含触发器的实例时，模型可能错误地将其预测为攻击者指定的目标类别。本文通过研究投毒样本的过滤策略，探讨降低后门攻击风险的途径。我们主要利用后门攻击的两个关键特性：一是单个模型中可同时存在多个后门；二是通过复合后门攻击（CBA）发现，将样本中的两个触发器修改为新的目标标签不会破坏触发器的原始功能，但当两个触发器同时存在时，能使数据被预测为新的目标类别。为此，我们提出了一种新颖的三阶段投毒数据过滤方法——复合后门投毒过滤（CBPF）。首先，利用已发现的投毒样本与干净样本在输出上的差异，划分出同时包含投毒和干净实例的数据子集。随后，通过引入良性触发器并调整标签来创建新的目标类别和良性目标类别，从而促使投毒数据和干净数据在推理阶段被分类为不同的实体。实验结果表明，CBPF能成功过滤掉CIFAR10和ImageNet-12数据集上六种先进攻击产生的恶意数据。在CIFAR10上，CBPF对这六种攻击的平均过滤成功率高达99.91%。此外，使用未受污染样本训练的模型仍能保持较高的准确率水平。