Web-based agents powered by large language models are increasingly used for tasks such as email management or professional networking. Their reliance on dynamic web content, however, makes them vulnerable to prompt injection attacks: adversarial instructions hidden in interface elements that persuade the agent to divert from its original task. We introduce the Task-Redirecting Agent Persuasion Benchmark (TRAP), a benchmark for studying how persuasion techniques misguide autonomous web agents on realistic tasks. Across six frontier models, agents are susceptible to prompt injection in 25% of tasks on average (13% for GPT-5 to 43% for DeepSeek-R1), with small interface or contextual changes often doubling success rates and revealing systemic, psychologically driven vulnerabilities in web-based agents. We also provide a modular social-engineering injection framework with controlled experiments on high-fidelity website clones, allowing for further benchmark expansion.
翻译:基于大语言模型的网络代理越来越广泛地应用于电子邮件管理或专业社交网络等任务中。然而,它们对动态网络内容的依赖性使其容易受到提示注入攻击:隐藏在界面元素中的对抗性指令会诱使代理偏离其原始任务。我们提出了任务重定向说服基准(TRAP),这是一个研究说服技术如何在现实任务中误导自主网络代理的基准。在六个前沿模型中,代理平均在25%的任务中易受提示注入攻击(GPT-5为13%,DeepSeek-R1为43%),而界面或上下文的微小变化常常使成功率翻倍,揭示了网络代理中系统性、心理驱动的脆弱性。我们还提供了一个模块化的社会工程注入框架,并在高保真网站克隆上进行了受控实验,以便进一步扩展该基准。