Suffix jailbreak attacks serve as a systematic method for red-teaming Large Language Models (LLMs) but suffer from prohibitive computational costs, as a large number of candidate suffixes need to be evaluated before identifying a jailbreak suffix. This paper presents Prefix-Shared KV Cache (PSKV), a plug-and-play inference optimization technique tailored for jailbreak suffix generation. Our method is motivated by a key observation that when performing suffix jailbreaking, while a large number of candidate prompts need to be evaluated, they share the same targeted harmful instruction as the prefix. Therefore, instead of performing redundant inference on the duplicated prefix, PSKV maintains a single KV cache for this prefix and shares it with every candidate prompt, enabling the parallel inference of diverse suffixes with minimal memory overhead. This design enables more aggressive batching strategies that would otherwise be limited by memory constraints. Extensive experiments on six widely used suffix attacks across five widely deployed LLMs demonstrate that PSKV reduces inference time by 40\% and peak memory usage by 50\%, while maintaining the original Attack Success Rate (ASR). The code has been submitted and will be released publicly.
翻译:后缀越狱攻击是对大型语言模型(LLMs)进行红队测试的一种系统方法,但评估大量候选后缀以确定越狱后缀的过程计算成本高昂。本文提出前缀共享键值缓存(PSKV),一种专为越狱后缀生成设计的即插即用推理优化技术。该方法源于一个关键观察:在进行后缀越狱时,虽然需要评估大量候选提示,但它们共享相同的有害目标指令作为前缀。因此,PSKV不再对重复的前缀进行冗余推理,而是为该前缀维护单一键值缓存并与每个候选提示共享,从而以最小内存开销实现不同后缀的并行推理。该设计支持更激进的批处理策略,而此类策略原本会受到内存限制。在五种广泛部署的大语言模型上针对六种常用后缀攻击的大量实验表明,PSKV在保持原始攻击成功率(ASR)的前提下,将推理时间减少40%、峰值内存使用减少50%。代码已提交并即将公开发布。