Web-use agents are rapidly being deployed to automate complex web tasks with extensive browser capabilities. However, these capabilities create a critical and previously unexplored attack surface. This paper demonstrates how attackers can exploit web-use agents by embedding malicious content in web pages, such as comments, reviews, or advertisements, that agents encounter during legitimate browsing tasks. We introduce the task-aligned injection technique that frames malicious commands as helpful task guidance rather than obvious attacks, exploiting fundamental limitations in LLMs' contextual reasoning. Agents struggle to maintain coherent contextual awareness and fail to detect when seemingly helpful web content contains steering attempts that deviate them from their original task goal. To scale this attack, we developed an automated three-stage pipeline that generates effective injections without manual annotation or costly online agent interactions during training, remaining efficient even with limited training data. This pipeline produces a generator model that we evaluate on five popular agents using payloads organized by the Confidentiality-Integrity-Availability (CIA) security triad, including unauthorized camera activation, file exfiltration, user impersonation, phishing, and denial-of-service. This generator achieves over 80% attack success rate (ASR) with strong transferability across unseen payloads, diverse web environments, and different underlying LLMs. This attack succeed even against agents with built-in safety mechanisms, requiring only the ability to post content on public websites. To address this risk, we propose comprehensive mitigation strategies including oversight mechanisms, execution constraints, and task-aware reasoning techniques.
翻译:网页使用智能体正被快速部署,以利用广泛的浏览器能力自动化复杂的网络任务。然而,这些能力也创造了一个关键且先前未被探索的攻击面。本文展示了攻击者如何通过在网页中嵌入恶意内容(例如智能体在合法浏览任务中遇到的评论、评价或广告)来利用网页使用智能体。我们引入了任务对齐注入技术,该技术将恶意命令包装成有用的任务指导而非明显的攻击,从而利用了大型语言模型在上下文推理方面的根本性局限。智能体难以维持连贯的上下文感知能力,无法检测看似有用的网页内容何时包含试图使其偏离原始任务目标的引导企图。为了规模化实施此攻击,我们开发了一个自动化的三阶段流程,该流程无需人工标注或在训练期间进行昂贵的在线智能体交互即可生成有效的注入攻击,即使在训练数据有限的情况下也能保持高效。该流程产生了一个生成器模型,我们依据保密性-完整性-可用性(CIA)安全三要素组织的攻击载荷(包括未经授权的摄像头激活、文件窃取、用户冒充、网络钓鱼和拒绝服务)在五个流行的智能体上对其进行了评估。该生成器实现了超过80%的攻击成功率,并在未见过的攻击载荷、多样化的网络环境以及不同的底层大型语言模型上表现出强大的可迁移性。即使面对内置安全机制的智能体,此攻击也能成功,仅需具备在公共网站发布内容的能力。为应对此风险,我们提出了全面的缓解策略,包括监督机制、执行约束和任务感知推理技术。