Environmental Injection Attacks against GUI Agents in Realistic Dynamic Environments

Graphical User Interface (GUI) agents are increasingly deployed to interact with online web services, yet their exposure to open-world content renders them vulnerable to Environmental Injection Attacks (EIAs). In these attacks, an attacker can inject crafted triggers into website to manipulate the behavior of GUI agents used by other users. In this paper, we find that most existing EIA studies fall short of realism. In particular, they fail to capture the dynamic nature of real-world web content, often assuming that a trigger's on-screen position and surrounding visual context remain largely consistent between training and testing. To better reflect practice, we introduce a realistic dynamic-environment threat model in which the attacker is a regular user and the trigger is embedded within a dynamically changing environment. Under this threat model, existing approaches largely fail, suggesting that their effectiveness in exposing GUI agent vulnerabilities has been substantially overestimated. To expose the hidden vulnerabilities of existing GUI agents effectively, we propose Chameleon, an attack framework with two key novelties designed for dynamic environments. (1) To synthesize more realistic training data, we introduce LLM-Driven Environment Simulation, which automatically generates diverse, high-fidelity webpage simulations that mimic the variability of real-world dynamic environments. (2) To optimize the trigger more effectively, we introduce Attention Black Hole, which converts attention weights into explicit supervisory signals. This mechanism encourages the agent to remain insensitive to irrelevant surrounding content, thereby improving robustness in dynamic environments. We evaluate Chameleon on six realistic websites and four representative LVLM-powered GUI agents, where it significantly outperforms existing methods.

翻译：图形用户界面（GUI）代理正日益广泛地部署于在线网络服务的交互场景，然而其对开放世界内容的暴露使其易受环境注入攻击（EIAs）的影响。在此类攻击中，攻击者可将精心设计的触发器注入网站，以操控其他用户使用的GUI代理的行为。本文发现，现有大多数EIA研究缺乏现实性考量，尤其未能捕捉真实网络内容的动态特性——这些研究通常假设触发器的屏幕位置及周边视觉语境在训练与测试阶段基本保持一致。为更准确地反映实际场景，本文提出一种现实动态环境威胁模型：攻击者作为普通用户，触发器嵌入于持续变化的环境中。在此威胁模型下，现有方法大多失效，表明其暴露GUI代理漏洞的有效性被严重高估。为有效揭示现有GUI代理的潜在漏洞，我们提出Chameleon攻击框架，其包含两项专为动态环境设计的关键创新：（1）为合成更具现实性的训练数据，我们引入LLM驱动的环境模拟技术，可自动生成多样化、高保真的网页模拟场景，以复现真实动态环境的可变性；（2）为更有效地优化触发器，我们提出注意力黑洞机制，将注意力权重转化为显式监督信号。该机制促使代理对无关周边内容保持不敏感性，从而提升动态环境中的鲁棒性。我们在六个现实网站及四个代表性LVLM驱动的GUI代理上评估Chameleon，其性能显著优于现有方法。