Even when prompts and parameters are secured, transformer language models remain vulnerable because their key-value (KV) cache during inference constitutes an overlooked attack surface. This paper introduces Malicious Token Injection (MTI), a modular framework that systematically perturbs cached key vectors at selected layers and timesteps through controlled magnitude and frequency, using additive Gaussian noise, zeroing, and orthogonal rotations. A theoretical analysis quantifies how these perturbations propagate through attention, linking logit deviations to the Frobenius norm of corruption and softmax Lipschitz dynamics. Empirical results show that MTI significantly alters next-token distributions and downstream task performance across GPT-2 and LLaMA-2/7B, as well as destabilizes retrieval-augmented and agentic reasoning pipelines. These findings identify cache integrity as a critical yet underexplored vulnerability in current LLM deployments, positioning cache corruption as a reproducible and theoretically grounded threat model for future robustness and security research.
翻译:即使在提示和参数得到保护的情况下,Transformer语言模型仍然存在脆弱性,因为其在推理过程中的键值(KV)缓存构成了一个被忽视的攻击面。本文提出了恶意令牌注入(MTI),这是一种模块化框架,通过受控的幅度和频率,在选定的层和时间步上,利用加性高斯噪声、归零和正交旋转,系统地扰动缓存的键向量。理论分析量化了这些扰动如何通过注意力机制传播,将逻辑偏差与破坏的Frobenius范数以及softmax Lipschitz动力学联系起来。实验结果表明,MTI显著改变了GPT-2和LLaMA-2/7B模型的下一个令牌分布及下游任务性能,并破坏了检索增强和智能体推理流程的稳定性。这些发现表明,缓存完整性是当前LLM部署中一个关键但尚未被充分探索的脆弱点,从而将缓存破坏定位为未来鲁棒性和安全性研究中一个可复现且具有理论依据的威胁模型。