Automatic Speech Recognition (ASR) systems operating in real-time settings must process acoustic input under strict temporal constraints, where transcription decisions are inherently made on incomplete information. This causal constraint serves as an information bottleneck on attackers, significantly limiting attack performance. Our new Semantic Gambit attack breaks this causal limitation by augmenting the adversary with predictive context derived from a Large Language Model in real-time. Our experiments show that this form of augmentation can elevate the corpus-level Word Error Rate to 35.6% -- a three-fold increase over the current state-of-the-art. Ultimately, this work reveals how common, low-latency LLM tooling can be exploited to systematically subvert real-time ASR pipelines.
翻译:在实时环境下运行的自动语音识别(ASR)系统必须严格遵循时间约束处理声学输入,其转录决策本质上基于不完整信息。这种因果约束对攻击者形成了信息瓶颈,显著限制了攻击性能。我们提出的新型语义策略攻击通过实时引入大型语言模型生成的预测性上下文,打破了这一因果限制。实验表明,这种增强形式可将语料级词错误率提升至35.6%——较当前最优水平提高三倍。最终,本研究揭示了如何利用常见的低延迟LLM工具系统性地攻破实时ASR流水线。