Jailbreak attacks on audio language models (ALMs) optimize audio perturbations to elicit unsafe generations, and they typically update the entire waveform densely throughout optimization. In this work, we investigate the necessity of such dense optimization by analyzing the structure of token-aligned gradients in ALMs. We find that gradient energy is highly non-uniform across audio tokens, indicating that only a small subset of token-aligned audio regions dominates the optimization signal. Motivated by this observation, we propose Token-Aware Gradient Optimization (TAGO), which enables sparse jailbreak optimization by retaining only waveform gradients aligned with audio tokens that have high gradient energy, while masking the remaining gradients at each iteration. Across three ALMs, TAGO outperforms baselines, and substantial sparsification preserves strong attack success rates (e.g. on Qwen3-Omni, $\mathrm{ASR}_{l}$ remains at 86% with a token retention ratio of 0.25, compared to 87% with full token retention). These results demonstrate that dense waveform updates are largely redundant, and we advocate that future audio jailbreak and safety alignment research should further leverage this heterogeneous token-level gradient structure.
翻译:针对音频语言模型(ALM)的越狱攻击通过优化音频扰动以引发不安全生成,此类方法通常在整个优化过程中密集地更新全部波形。本文通过分析ALM中令牌对齐梯度的结构,探究此类密集优化的必要性。研究发现梯度能量在音频令牌间呈现高度非均匀分布,仅少量令牌对齐的音频区域主导着优化信号。基于此观察,我们提出令牌感知梯度优化方法(TAGO),该方法通过保留梯度能量较高的音频令牌所对应的波形梯度,并在每次迭代中屏蔽其余梯度,从而实现稀疏越狱优化。在三个ALM上的实验表明,TAGO性能优于基线方法,且显著稀疏化仍能保持高攻击成功率(例如,在Qwen3-Omni上,令牌保留率为0.25时$\mathrm{ASR}_{l}$保持86%,而全令牌保留时为87%)。这些结果证明密集波形更新在很大程度上是冗余的,我们呼吁未来音频越狱与安全对齐研究应进一步利用这种异质性令牌级梯度结构。