FlashRT: Towards Computationally and Memory Efficient Red-Teaming for Prompt Injection and Knowledge Corruption

Long-context large language models (LLMs)-for example, Gemini-3.1-Pro and Qwen-3.5-are widely used to empower many real-world applications, such as retrieval-augmented generation, autonomous agents, and AI assistants. However, security remains a major concern for their widespread deployment, with threats such as prompt injection and knowledge corruption. To quantify the security risks faced by LLMs under these threats, the research community has developed heuristic-based and optimization-based red-teaming methods. Optimization-based methods generally produce stronger attacks than heuristic attacks and thus provide a more rigorous assessment of LLM security risks. However, they are often resource-intensive, requiring significant computation and GPU memory, especially for long context scenarios. The resource-intensive nature poses a major obstacle for the community (especially academic researchers) to systematically evaluate the security risks of long-context LLMs and assess the effectiveness of defense strategies at scale. In this work, we propose FlashRT, the first framework to improve the efficiency (in terms of both computation and memory) for optimization-based prompt injection and knowledge corruption attacks under long-context LLMs. Through extensive evaluations, we find that FlashRT consistently delivers a 2x-7x speedup (e.g., reducing runtime from one hour to less than ten minutes) and a 2x-4x reduction in GPU memory consumption (e.g., reducing from 264.1 GB to 65.7 GB GPU memory for a 32K token context) compared to state-of-the-art baseline nanoGCG. FlashRT can be broadly applied to black-box optimization methods, such as TAP and AutoDAN. We hope FlashRT can serve as a red-teaming tool to enable systematic evaluation of long-context LLM security. The code is available at: https://github.com/Wang-Yanting/FlashRT

翻译：长上下文大语言模型——例如Gemini-3.1-Pro和Qwen-3.5——被广泛用于赋能检索增强生成、自主智能体及AI助手等实际应用。然而，安全性仍是其大规模部署面临的主要挑战，尤其存在提示注入和知识破坏等威胁。为量化大语言模型在此类威胁下的安全风险，研究社区已开发出基于启发式和基于优化的红队测试方法。基于优化的方法通常能产生比启发式攻击更强的攻击效果，从而更严格地评估大语言模型的安全风险，但往往资源密集，需要大量计算和GPU内存，尤其在长上下文场景中。这种资源密集型特性严重阻碍了社区（尤其是学术研究人员）系统评估长上下文大语言模型的安全风险，以及大规模评估防御策略的有效性。本研究提出FlashRT——首个旨在提升长上下文大语言模型下基于优化的提示注入与知识破坏攻击效率（包括计算效率和内存效率）的框架。通过大量评估，我们发现与最先进的基线方法nanoGCG相比，FlashRT始终能实现2倍至7倍的加速（例如，运行时间从一小时缩短至不到十分钟），以及2倍至4倍的GPU内存消耗降低（例如，在32K token上下文场景下，GPU内存从264.1 GB降至65.7 GB）。FlashRT可广泛适用于诸如TAP和AutoDAN等黑盒优化方法。我们期望FlashRT能成为系统评估长上下文大语言模型安全性的红队测试工具。代码已开源在：https://github.com/Wang-Yanting/FlashRT