With the advances in deep learning, speaker verification has achieved very high accuracy and is gaining popularity as a type of biometric authentication option in many scenes of our daily life, especially the growing market of web services. Compared to traditional passwords, "vocal passwords" are much more convenient as they relieve people from memorizing different passwords. However, new machine learning attacks are putting these voice authentication systems at risk. Without a strong security guarantee, attackers could access legitimate users' web accounts by fooling the deep neural network (DNN) based voice recognition models. In this paper, we demonstrate an easy-to-implement data poisoning attack to the voice authentication system, which can hardly be captured by existing defense mechanisms. Thus, we propose a more robust defense method, called Guardian, which is a convolutional neural network-based discriminator. The Guardian discriminator integrates a series of novel techniques including bias reduction, input augmentation, and ensemble learning. Our approach is able to distinguish about 95% of attacked accounts from normal accounts, which is much more effective than existing approaches with only 60% accuracy.
翻译:随着深度学习的进步,说话人验证已实现极高准确率,并作为生物特征认证选项之一,在我们日常生活的众多场景中日益普及,尤其是日益增长的网络服务市场。与传统密码相比,“语音密码”更加便捷,因为人们无需记忆不同的密码。然而,新型机器学习攻击正将这些语音认证系统置于风险之中。缺乏强大的安全保障时,攻击者可通过欺骗基于深度神经网络(DNN)的语音识别模型,访问合法用户的网络账户。本文展示了一种易于实现的数据投毒攻击方法,该方法针对语音认证系统,且现有防御机制几乎无法捕获。为此,我们提出一种更鲁棒的防御方法——Guardian,它是一种基于卷积神经网络的判别器。Guardian 判别器集成了包括偏差减少、输入增强和集成学习在内的一系列创新技术。我们的方法能区分约95%的被攻击账户与正常账户,远优于仅达到60%准确率的现有方法。