In recent years, extensive research has been conducted on the vulnerability of ASR systems, revealing that black-box adversarial example attacks pose significant threats to real-world ASR systems. However, most existing black-box attacks rely on queries to the target ASRs, which is impractical when queries are not permitted. In this paper, we propose ZQ-Attack, a transfer-based adversarial attack on ASR systems in the zero-query black-box setting. Through a comprehensive review and categorization of modern ASR technologies, we first meticulously select surrogate ASRs of diverse types to generate adversarial examples. Following this, ZQ-Attack initializes the adversarial perturbation with a scaled target command audio, rendering it relatively imperceptible while maintaining effectiveness. Subsequently, to achieve high transferability of adversarial perturbations, we propose a sequential ensemble optimization algorithm, which iteratively optimizes the adversarial perturbation on each surrogate model, leveraging collaborative information from other models. We conduct extensive experiments to evaluate ZQ-Attack. In the over-the-line setting, ZQ-Attack achieves a 100% success rate of attack (SRoA) with an average signal-to-noise ratio (SNR) of 21.91dB on 4 online speech recognition services, and attains an average SRoA of 100% and SNR of 19.67dB on 16 open-source ASRs. For commercial intelligent voice control devices, ZQ-Attack also achieves a 100% SRoA with an average SNR of 15.77dB in the over-the-air setting.
翻译:近年来,针对自动语音识别系统脆弱性的广泛研究表明,黑盒对抗样本攻击对现实世界中的ASR系统构成了重大威胁。然而,现有的大多数黑盒攻击依赖于对目标ASR的查询,这在查询不被允许的情况下是不切实际的。本文提出ZQ-Attack,一种在零查询黑盒设置下针对ASR系统的基于迁移的对抗攻击。通过对现代ASR技术的全面回顾和分类,我们首先精心选择不同类型的替代ASR来生成对抗样本。随后,ZQ-Attack使用缩放后的目标命令音频初始化对抗扰动,使其在保持有效性的同时相对不易被察觉。接着,为了实现对抗扰动的高可迁移性,我们提出了一种顺序集成优化算法,该算法迭代地在每个替代模型上优化对抗扰动,并利用来自其他模型的协作信息。我们进行了大量实验来评估ZQ-Attack。在线设置下,ZQ-Attack在4个在线语音识别服务上实现了100%的攻击成功率,平均信噪比为21.91dB;在16个开源ASR上实现了平均100%的攻击成功率和19.67dB的平均信噪比。对于商用智能语音控制设备,在无线传输设置下,ZQ-Attack同样实现了100%的攻击成功率,平均信噪比为15.77dB。