We propose a Universal Defence against backdoor attacks based on Clustering and Centroids Analysis (CCA-UD). The goal of the defence is to reveal whether a Deep Neural Network model is subject to a backdoor attack by inspecting the training dataset. CCA-UD first clusters the samples of the training set by means of density-based clustering. Then, it applies a novel strategy to detect the presence of poisoned clusters. The proposed strategy is based on a general misclassification behaviour observed when the features of a representative example of the analysed cluster are added to benign samples. The capability of inducing a misclassification error is a general characteristic of poisoned samples, hence the proposed defence is attack-agnostic. This marks a significant difference with respect to existing defences, that, either can defend against only some types of backdoor attacks, or are effective only when some conditions on the poisoning ratio or the kind of triggering signal used by the attacker are satisfied. Experiments carried out on several classification tasks and network architectures, considering different types of backdoor attacks (with either clean or corrupted labels), and triggering signals, including both global and local triggering signals, as well as sample-specific and source-specific triggers, reveal that the proposed method is very effective to defend against backdoor attacks in all the cases, always outperforming the state of the art techniques.
翻译:我们提出一种基于聚类与中心点分析的通用后门防御方法(CCA-UD)。该防御方法通过检测训练数据集,揭示深度神经网络模型是否遭受后门攻击。CCA-UD首先利用密度聚类对训练集样本进行聚类,随后采用新策略检测是否存在中毒聚类。该策略基于以下观察:当将所分析聚类的代表性样本特征添加到良性样本中时,会引发普遍性的分类错误行为。诱导分类错误的能力是中毒样本的通用特征,因此本防御方法具有攻击无关性。这与现有防御方法形成显著差异——现有方法要么只能抵御特定类型的后门攻击,要么仅在满足投毒比例或攻击者使用的触发信号类型等特定条件时有效。我们在多种分类任务和网络架构上开展实验,涵盖不同类型的后门攻击(包括干净标签与污染标签)、全局与局部触发信号、样本特异性与源特异性触发信号。实验结果表明,本方法在所有情况下均能有效抵御后门攻击,且性能始终超越现有最优技术。