Speaker verification has been widely used in many authentication scenarios. However, training models for speaker verification requires large amounts of data and computing power, so users often use untrustworthy third-party data or deploy third-party models directly, which may create security risks. In this paper, we propose a backdoor attack for the above scenario. Specifically, for the Siamese network in the speaker verification system, we try to implant a universal identity in the model that can simulate any enrolled speaker and pass the verification. So the attacker does not need to know the victim, which makes the attack more flexible and stealthy. In addition, we design and compare three ways of selecting attacker utterances and two ways of poisoned training for the GE2E loss function in different scenarios. The results on the TIMIT and Voxceleb1 datasets show that our approach can achieve a high attack success rate while guaranteeing the normal verification accuracy. Our work reveals the vulnerability of the speaker verification system and provides a new perspective to further improve the robustness of the system.
翻译:说话人验证已广泛应用于多种身份认证场景。然而,训练说话人验证模型需要大量数据与计算资源,用户常使用不可信的第三方数据或直接部署第三方模型,这可能引发安全隐患。本文针对上述场景提出一种后门攻击方法。具体而言,针对说话人验证系统中的孪生网络,我们尝试在模型中植入一个通用身份,该身份能够模拟任意注册说话人并通过验证。攻击者无需知晓受害者信息,使得攻击更具灵活性与隐蔽性。此外,我们针对不同场景下的GE2E损失函数,设计并比较了三种攻击者语音选择方式与两种中毒训练策略。在TIMIT和Voxceleb1数据集上的实验表明,我们的方法在保证正常验证准确率的同时能够实现较高的攻击成功率。本研究揭示了说话人验证系统的脆弱性,为提升系统鲁棒性提供了新的视角。