The Landscape of Prompt Injection Threats in LLM Agents: From Taxonomy to Analysis

The evolution of Large Language Models (LLMs) has resulted in a paradigm shift towards autonomous agents, necessitating robust security against Prompt Injection (PI) vulnerabilities where untrusted inputs hijack agent behaviors. This SoK presents a comprehensive overview of the PI landscape, covering attacks, defenses, and their evaluation practices. Through a systematic literature review and quantitative analysis, we establish taxonomies that categorize PI attacks by payload generation strategies (heuristic vs. optimization) and defenses by intervention stages (text, model, and execution levels). Our analysis reveals a key limitation shared by many existing defenses and benchmarks: they largely overlook context-dependent tasks, in which agents are authorized to rely on runtime environmental observations to determine actions. To address this gap, we introduce AgentPI, a new benchmark designed to systematically evaluate agent behavior under context-dependent interaction settings. Using AgentPI, we empirically evaluate representative defenses and show that no single approach can simultaneously achieve high trustworthiness, high utility, and low latency. Moreover, we show that many defenses appear effective under existing benchmarks by suppressing contextual inputs, yet fail to generalize to realistic agent settings where context-dependent reasoning is essential. This SoK distills key takeaways and open research problems, offering structured guidance for future research and practical deployment of secure LLM agents.

翻译：大语言模型（LLM）的发展已引发向自主智能体的范式转变，这要求系统必须对提示注入（PI）漏洞具备鲁棒的安全性，即防止不可信输入劫持智能体行为。本文献综述（SoK）全面概述了提示注入领域，涵盖攻击、防御及其评估实践。通过系统性文献综述与定量分析，我们建立了分类体系：依据载荷生成策略（启发式与优化）对攻击进行分类，并依据干预阶段（文本层、模型层与执行层）对防御进行分类。我们的分析揭示了现有许多防御方法与基准测试共有的一个关键局限：它们大多忽视了上下文依赖型任务，即智能体被授权依据运行时环境观测来决定行动。为填补这一空白，我们提出了 AgentPI——一个专为系统评估上下文依赖交互设置下智能体行为而设计的新基准。利用 AgentPI，我们对代表性防御方法进行了实证评估，结果表明没有任何单一方法能够同时实现高可信度、高实用性与低延迟。此外，我们发现许多防御方法在现有基准测试中通过抑制上下文输入而显得有效，却无法推广到上下文依赖推理至关重要的现实智能体场景中。本综述提炼了关键结论与开放研究问题，为未来安全大语言模型智能体的研究与实际部署提供了结构化指导。