Backdoor attacks allow an attacker to embed a specific vulnerability in a machine learning algorithm, activated when an attacker-chosen pattern is presented, causing a specific misprediction. The need to identify backdoors in biometric scenarios has led us to propose a novel technique with different trade-offs. In this paper we propose to use model pairs on open-set classification tasks for detecting backdoors. Using a simple linear operation to project embeddings from a probe model's embedding space to a reference model's embedding space, we can compare both embeddings and compute a similarity score. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures, having been trained independently and on different datasets. Additionally, we show that backdoors can be detected even when both models are backdoored. The source code is made available for reproducibility purposes.
翻译:后门攻击允许攻击者在机器学习算法中嵌入特定漏洞,当攻击者选择的模式出现时触发该漏洞,导致特定的错误预测。识别生物特征场景中的后门需求促使我们提出一种具有不同权衡特性的新型技术。本文提出在开放集分类任务中使用模型配对来检测后门。通过简单的线性操作,将探测模型嵌入空间中的嵌入向量投影至参考模型的嵌入空间,可对两种嵌入进行对比并计算相似度分数。研究表明,尽管模型采用不同架构、独立训练且使用不同数据集,该分数仍能作为后门存在的指示指标。此外,实验证明即使两个模型均被植入后门,依然能够实现后门检测。为保障可复现性,相关源代码已公开提供。