Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.
翻译:针对音频语言模型(ALMs)的越狱攻击通过优化音频扰动来引发不安全生成,此类方法通常在优化过程中密集更新整个波形。本文通过分析ALMs中令牌对齐梯度的结构,研究了此类密集优化的必要性。我们发现梯度能量在音频令牌间呈现高度非均匀分布,表明仅有少量令牌对齐的音频区域主导了优化信号。基于这一发现,我们提出令牌感知梯度优化(TAGO),该方法通过保留与高梯度能量音频令牌对齐的波形梯度,并在每次迭代中屏蔽其余梯度,实现稀疏越狱优化。在三种ALMs上的实验表明,TAGO优于基线方法,且大幅稀疏化仍能保持较高的攻击成功率(例如在Qwen3-Omni上,令牌保留率为0.25时$\mathrm{ASR}_{l}$保持86%,而完全保留令牌时为87%)。这些结果证明密集波形更新存在大量冗余,我们主张未来音频越狱与安全对齐研究应进一步利用这种异质性的令牌级梯度结构。