Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based jailbreak attacks. However, an important bridge between textual and audio jailbreaks remains underexplored. In this work, we study the cross-modality transfer of jailbreak attacks from text to audio, motivated by the semantic similarity between the two modalities and the maturity of textual jailbreak methods. We first analyze the connection between modality alignment and cross-modality jailbreak transfer, showing that strong alignment can inadvertently propagate textual vulnerabilities to the audio modality, which we term the alignment curse. Guided by this analysis, we conduct an empirical evaluation of textual jailbreaks, text-transferred audio jailbreaks, and existing audio-based jailbreaks on recent omni-models. Our results show that text-transferred audio jailbreaks perform comparably to, and often better than, audio-based jailbreaks, establishing them as simple yet powerful baselines for future audio red-teaming. We further demonstrate strong cross-model transferability and show that text-transferred audio attacks remain effective even under a stricter audio-only access threat model.
翻译:近期端到端训练的全模态模型在多模态理解方面取得了显著进展。与此同时,安全红队测试的范围已从文本扩展到包含基于音频的越狱攻击。然而,文本与音频越狱攻击之间的重要桥梁仍未得到充分探索。本研究基于两种模态间的语义相似性及文本越狱方法的成熟度,探究从文本到音频的跨模态越狱攻击迁移。我们首先分析了模态对齐与跨模态越狱迁移之间的关联,发现强对齐可能无意中将文本漏洞传播至音频模态,这一现象我们称之为"对齐诅咒"。基于此分析,我们对近期全模态模型进行了实证评估,涵盖文本越狱、文本迁移音频越狱及现有音频越狱方法。实验结果表明,文本迁移音频越狱的表现与音频越狱相当,且往往更优,这使其成为未来音频红队测试中简洁而强大的基准方法。我们进一步证明了其强大的跨模型迁移能力,并表明即使在更严格的纯音频访问威胁模型下,文本迁移音频攻击仍保持有效性。