Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents

Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory storage or exploit shared memory across users, we present a more realistic threat model: contamination through environmental observation alone. We introduce Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), the first attack to achieve cross-session, cross-site compromise without requiring direct memory access. A single contaminated observation (e.g., viewing a manipulated product page) silently poisons an agent's memory and activates during future tasks on different websites, bypassing permission-based defenses. Our experiments on (Visual)WebArena reveal two key findings. First, eTAMP achieves substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Second, we discover Frustration Exploitation: agents under environmental stress become dramatically more susceptible, with ASR increasing up to 8 times when agents struggle with dropped clicks or garbled text. Notably, more capable models are not more secure. GPT-5.2 shows substantial vulnerability despite superior task performance. With the rise of AI browsers like OpenClaw, ChatGPT Atlas, and Perplexity Comet, our findings underscore the urgent need for defenses against environment-injected memory poisoning.

翻译：记忆使基于大语言模型的Web智能体实现个性化、强大且可被利用。通过存储过往交互以个性化未来任务，智能体会无意间创建一个跨越网站和会话的持久攻击面。虽然现有关于记忆的安全研究假设攻击者能直接注入记忆存储或利用跨用户共享记忆，但我们提出了一种更现实的威胁模型：仅通过环境观察进行污染。我们引入环境注入式轨迹记忆中毒攻击（eTAMP），这是首个无需直接访问记忆即可实现跨会话、跨站点攻击的方法。一次受污染的观察（例如查看被操纵的产品页面）能悄无声息地毒化智能体的记忆，并在未来不同网站的任务中激活，从而绕过基于权限的防御。我们在(Visual)WebArena上的实验揭示了两项关键发现。首先，eTAMP实现了较高的攻击成功率：在GPT-5-mini上达32.5%，在GPT-5.2上达23.4%，在GPT-OSS-120B上达19.5%。其次，我们发现了挫折利用效应：处于环境压力下的智能体会变得显著更易受影响，当智能体在应对点击失效或乱码文本时，攻击成功率可提升至8倍。值得注意的是，更强大的模型并不更安全。GPT-5.2尽管任务性能优越，却表现出显著脆弱性。随着OpenClaw、ChatGPT Atlas和Perplexity Comet等AI浏览器的兴起，我们的研究结果凸显了针对环境注入式记忆中毒的防御措施的紧迫需求。