Beyond Waveform Robustness: Robust Feature-Vocoder Adversarial Attacks on Automatic Speech Recognition

Automatic speech recognition (ASR) systems have become widely used for multilingual speech-to-text transcription. Their robustness to adversarial attacks has become an important topic for the community. Existing adversarial attacks directly add adversarial noise to the speech audio. However, prior work has shown that existing adversarial attacks face two limitations: they often transfer poorly to black-box ASR systems and are increasingly mitigated by defenses tailored to input-space perturbations. In this work, we propose a Clean-Referenced Feature-Vocoder Attack, a surrogate-based black-box attack that moves the adversarial search space from raw waveforms to self-supervised learning (SSL) representations. To address the transferability limitation, we perturb more generalizable acoustic-phonetic representations rather than low-level waveform samples, reducing dependence on surrogate-specific waveform gradients and encouraging adversarial perturbations that generalize across ASR systems. To bypass different defenses, we shift the adversarial signal from explicit additive waveform noise to SSL feature-space perturbations and reconstruct them through a vocoder into speech-like waveform adversarial signals, making the resulting samples less aligned with waveform-bounded defenses. Extensive experiments show that, when optimized only on raw Whisper-small as a public surrogate model, our attack transfers effectively to black-box ASR models with a +26.6 WER improvement over the SOTA baseline, while also remaining effective against multiple training defenses with a +36.2 WER improvement. These results reveal a blind spot in current ASR robustness evaluation.

翻译：自动语音识别（ASR）系统已广泛应用于多语言语音到文本转录。其对抗攻击的鲁棒性已成为领域内重要议题。现有攻击方法直接在语音音频中添加对抗噪声，但先前研究表明此类攻击存在两大局限性：对黑盒ASR系统的迁移性较差，且易被针对输入空间扰动的防御措施所缓解。本文提出一种基于替代模型的黑盒攻击——干净参考特征-声码器攻击，该方法将对抗搜索空间从原始波形迁移至自监督学习（SSL）表征。为解决迁移性不足问题，我们扰动更具泛化能力的声学-音素表征而非底层波形样本，降低对替代模型特定波形梯度的依赖，从而生成可跨ASR系统迁移的对抗扰动。为绕过各类防御机制，我们将对抗信号从显式加性波形噪声转移至SSL特征空间扰动，并通过声码器将其重构为类语音波形对抗信号，使生成的样本与基于波形边界的防御策略产生偏差。大量实验表明，仅以原始Whisper-small作为公开替代模型进行优化时，本攻击对黑盒ASR模型的有效性较现有最优基线提升+26.6%的词错误率（WER），同时针对多种训练防御策略仍保持+36.2%的WER提升效果。这些结果揭示了当前ASR鲁棒性评估的盲区。