Codec-Robust Attacks on Audio LLMs

Prior attacks on Audio Large Language Models (Audio LLMs) demonstrated that carefully crafted waveform-domain perturbations can force targeted adversarial outputs. As a defense mechanism against these attacks, real-world codec compression preprocessing has been studied to both detect and remove the perturbations. Yet no existing attack has demonstrated robustness against these compressions. We introduce CodecAttack, which optimizes a perturbation in a neural audio codec's continuous latent space rather than directly perturbing the audio waveform. We show that the codec's compression channel, which discards waveform perturbations, transmits perturbations crafted in its own latent space. To further harden the attack across real-world compression channels, we apply multi-bitrate straight-through Expectation-over-Transformation (EoT), all without modifying the target model. Across three realistic Audio LLM deployment scenarios and three target models, CodecAttack achieves an average 85.5% target-substring attack success rate (ASR) on Opus at moderate bitrates, while the waveform baseline trained with identical EoT hardening does not exceed 26% at any bitrate. The attack transfers to held-out codecs, reaching up to 100% ASR on MP3 and 84% on AAC-LC without retraining. A per-band energy analysis shows that the latent perturbation concentrates below 4kHz, exactly where codecs allocate the most bits, while the waveform baseline spreads into higher frequencies that codecs discard. These results demonstrate that lossy compression is not a reliable defense against adversarial audio and that codec-aware attacks pose a practical threat to deployed Audio LLM systems.

翻译：先前针对音频大语言模型的攻击表明，精心设计的波形域扰动可迫使模型产生特定对抗输出。作为防御机制，实际编解码压缩预处理已被研究用于检测并移除此类扰动。然而现有攻击均未能展现出对这些压缩操作的鲁棒性。我们提出CodecAttack方法，该方法在神经音频编解码器的连续潜空间中优化扰动，而非直接扰动音频波形。研究表明：编解码器压缩通道虽然能够滤除波形扰动，但会完整传递在其自身潜空间中设计的扰动。为强化攻击对实际压缩通道的适应性，我们采用多码率直通期望变换技术，且全程无需修改目标模型。在三种真实音频大语言模型部署场景及三个目标模型上，CodecAttack在Opus编码器以中等码率运行时取得平均85.5%的目标子串攻击成功率，而采用相同EoT强化的波形基线方法在任何码率下均未超过26%。该攻击可迁移至未参与训练的编解码器，对MP3达到100%攻击成功率，对AAC-LC达到84%且无需重新训练。频带能量分析显示，潜空间扰动集中分布在4kHz以下——这正是编解码器分配最多比特率的频段，而波形基线扰动则扩散至编解码器会丢弃的高频区域。这些结果表明：有损压缩并非对抗音频的可靠防御手段，而编解码感知攻击对部署中的音频大语言模型系统构成实际威胁。