Recent years have seen a surge in the popularity of acoustics-enabled personal devices powered by machine learning. Yet, machine learning has proven to be vulnerable to adversarial examples. A large number of modern systems protect themselves against such attacks by targeting artificiality, i.e., they deploy mechanisms to detect the lack of human involvement in generating the adversarial examples. However, these defenses implicitly assume that humans are incapable of producing meaningful and targeted adversarial examples. In this paper, we show that this base assumption is wrong. In particular, we demonstrate that for tasks like speaker identification, a human is capable of producing analog adversarial examples directly with little cost and supervision: by simply speaking through a tube, an adversary reliably impersonates other speakers in eyes of ML models for speaker identification. Our findings extend to a range of other acoustic-biometric tasks such as liveness detection, bringing into question their use in security-critical settings in real life, such as phone banking.
翻译:近年来,基于机器学习的声学个人设备日益普及。然而,机器学习已被证明容易受到对抗样本的攻击。大量现代系统通过针对人工性来保护自身免受此类攻击,即它们部署了检测机制来识别对抗样本生成过程中是否缺乏人类参与。然而,这些防御措施隐含地假设人类无法产生有意义且有目标的对抗样本。在本文中,我们证明这一基本假设是错误的。具体而言,我们证明在说话人识别等任务中,人类能够以极低的成本和监督直接产生模拟对抗样本:通过简单地通过管子说话,攻击者就能在机器学习模型面前可靠地冒充其他说话人。我们的发现还扩展到其他声学生物识别任务,如活体检测,这对其在现实生活中(如电话银行)等安全关键场景的应用提出了质疑。