This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
翻译:本文提出一种数据高效的深度神经网络黑盒场景下后门攻击检测方法。该方法基于以下直觉:在决定被后门攻击网络输出时,触发器对应特征相比其他良性特征具有更高的影响力。为定量衡量触发器与良性特征对被后门攻击网络输出的影响,我们引入五个度量指标。对于给定输入,首先通过将输入的部分内容注入干净验证样本生成若干合成样本,进而利用对应合成样本的输出标签计算这五个度量值。本研究的贡献之一在于仅需使用极小的干净验证数据集。基于计算得到的五个度量值,从验证数据集中训练五个新颖性检测器,再由元新颖性检测器融合这五个检测器的输出以生成元置信度分数。在线测试阶段,本方法通过评估元新颖性检测器输出的元置信度分数判断在线样本是否被投毒。我们通过涵盖消融实验及与现有方法对比的广泛后门攻击验证了方法论的有效性。由于所提出的五个度量值量化了干净样本与投毒样本之间的固有差异,该方法具有显著潜力。此外,本检测方法可通过增加未来针对高级攻击可能提出的新度量指标进行增量式改进。