Speech-driven large language models (LLMs) are increasingly accessed through speech interfaces, introducing new security risks via open acoustic channels. We present Sirens' Whisper (SWhisper), the first practical framework for covert prompt-based attacks against speech-driven LLMs under realistic black-box conditions using commodity hardware. SWhisper enables robust, inaudible delivery of arbitrary target baseband audio-including long and structured prompts-on commodity devices by encoding it into near-ultrasound waveforms that demodulate faithfully after acoustic transmission and microphone nonlinearity. This is achieved through a simple yet effective approach to modeling nonlinear channel characteristics across devices and environments, combined with lightweight channel-inversion pre-compensation. Building on this high-fidelity covert channel, we design a voice-aware jailbreak generation method that ensures intelligibility, brevity, and transferability under speech-driven interfaces. Experiments across both commercial and open-source speech-driven LLMs demonstrate strong black-box effectiveness. On commercial models, SWhisper achieves up to 0.94 non-refusal (NR) and 0.925 specific-convincing (SC). A controlled user study further shows that the injected jailbreak audio is perceptually indistinguishable from background-only playback for human listeners. Although jailbreaks serve as a case study, the underlying covert acoustic channel enables a broader class of high-fidelity prompt-injection and commandexecution attacks.
翻译:语音驱动大语言模型正日益通过语音接口被访问,这通过开放的声学信道引入了新的安全风险。我们提出了塞壬的低语,这是首个在现实黑盒条件下使用商用硬件、针对语音驱动大语言模型进行隐蔽提示攻击的实用框架。SWhisper通过将任意目标基带音频(包括长且结构化的提示)编码为近超声波波形,能够在商用设备上实现稳健、不可听的传输,这些波形在声学传输和麦克风非线性效应后能忠实地解调。这是通过一种简单而有效的方法实现的,该方法对跨设备和环境的非线性信道特性进行建模,并结合轻量级的信道反转预补偿。基于这个高保真的隐蔽信道,我们设计了一种语音感知的越狱生成方法,确保其在语音驱动接口下的可理解性、简洁性和可迁移性。在商业和开源语音驱动大语言模型上的实验证明了其强大的黑盒有效性。在商业模型上,SWhisper实现了高达0.94的非拒绝率和0.925的特定说服率。一项受控用户研究进一步表明,对于人类听者而言,注入的越狱音频在感知上与仅播放背景音无法区分。尽管越狱攻击是作为一个案例研究,但底层的隐蔽声学信道支持更广泛的高保真提示注入和命令执行攻击类别。