Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
翻译:后门攻击允许攻击者在机器学习算法中嵌入特定漏洞,当出现攻击者选择的模式时激活,导致特定的错误预测。在生物识别场景中识别后门的需求促使我们提出一种具有不同权衡特性的新技术。本文提出在开放集分类任务中使用模型配对来检测后门。通过简单的线性运算将嵌入从探测模型的嵌入空间投影到参考模型的嵌入空间,我们可以比较两种嵌入并计算相似度得分。研究表明,尽管模型具有不同架构、经过独立训练且使用不同数据集,该得分仍可作为后门存在的指示器。该技术能够检测专为开放集分类任务设计的模型中的后门,而现有文献对此研究甚少。此外,我们证明即使两个模型均被植入后门,仍可进行有效检测。为保障可复现性,本研究已公开源代码。