Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

Web agents driven by large language models (LLMs) are increasingly deployed in real-world environments, where they operate over untrusted web content and execute actions with direct consequences. This makes them vulnerable to prompt-injection attacks, in which seemingly benign content embeds adversarial instructions that manipulate agent behaviour. Existing security benchmarks adopt an \textit{attack-centric} perspective, focusing on the technical feasibility of injections while overlooking the nuanced distribution of resulting harms. In practice, however, prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders, and the same attack pattern may exhibit substantially different effectiveness depending on whom it targets. To capture these properties, we introduce \textbf{\sysname}, a \textit{stakeholder-centric} benchmark to systematically categorize and attribute harm in real-world web agent systems. It distinguishes between affected entities (e.g., user, seller, platform), decomposes the attacks into concrete objectives, and evaluates each case with complementary outcome- and process-level metrics. Our results reveal substantial and heterogeneous vulnerabilities: not a single attack objective is reliably resisted by current agents, and failures distribute across qualitatively distinct modes ranging from \emph{stealthy parasitism} (attack succeeds without disrupting the user's delegated task) to \emph{misaligned disruption} (task disrupted without attack success) and \emph{compounded failure} (both adversarial objective and task integrity simultaneously violated). These patterns are missed by conventional evaluation, highlighting the need for stakeholder-aware assessment of LLM-based agents in real-world deployments. Benchmark is available at https://github.com/StakeBench/SBC.

翻译：基于大语言模型的网页代理日益部署于真实环境中，它们在不可信网页内容上运行并执行具有直接后果的操作。这使得它们易受提示注入攻击——看似良性的内容中嵌入了操纵代理行为的对抗性指令。现有安全基准采用以攻击为中心的视角，聚焦于注入的技术可行性，却忽略了由此产生的危害的细微分布。然而在实践中，提示注入风险具有受害者依赖性：同一漏洞可能对不同利益相关者产生不对称后果，同一攻击模式也会因目标不同而表现出显著差异。为捕捉这些特性，我们提出以利益相关者为中心的基准系统\sysname，用以系统性地归类并归因真实世界网页代理系统中的危害。它区分受影响实体（如用户、卖家、平台），将攻击分解为具体目标，并通过互补的结果级与过程级指标评估每种情况。我们的结果揭示了显著且异质性的漏洞：当前代理未能可靠抵御任何单一攻击目标，且失败模式分布于从隐蔽寄生（攻击成功但未干扰用户委托任务）到错位破坏（任务中断但攻击未成功）和复合失效（对抗目标与任务完整性同时被破坏）等性质不同的失效模式。这些模式被传统评估方法忽略，凸显了在真实部署场景中对基于大语言模型的代理进行利益相关者感知评估的必要性。基准测试代码已开源：https://github.com/StakeBench/SBC