Modern vision-language-model (VLM) based graphical user interface (GUI) agents are expected not only to execute actions accurately but also to respond to user instructions with low latency. While existing research on GUI-agent security mainly focuses on manipulating action correctness, the security risks related to response efficiency remain largely unexplored. In this paper, we introduce SlowBA, a novel backdoor attack that targets the responsiveness of VLM-based GUI agents. The key idea is to manipulate response latency by inducing excessively long reasoning chains under specific trigger patterns. To achieve this, we propose a two-stage reward-level backdoor injection (RBI) strategy that first aligns the long-response format and then learns trigger-aware activation through reinforcement learning. In addition, we design realistic pop-up windows as triggers that naturally appear in GUI environments, improving the stealthiness of the attack. Extensive experiments across multiple datasets and baselines demonstrate that SlowBA can significantly increase response length and latency while largely preserving task accuracy. The attack remains effective even with a small poisoning ratio and under several defense settings. These findings reveal a previously overlooked security vulnerability in GUI agents and highlight the need for defenses that consider both action correctness and response efficiency. Code can be found in https://github.com/tu-tuing/SlowBA.
翻译:现代基于视觉语言模型(VLM)的图形用户界面(GUI)代理不仅需要准确执行动作,还需以低延迟响应用户指令。现有关于GUI代理安全性的研究主要集中于操纵动作的正确性,而与响应效率相关的安全风险在很大程度上尚未被探索。本文提出SlowBA,一种针对基于VLM的GUI代理响应能力的新型后门攻击。其核心思想是通过在特定触发模式下诱导过长的推理链来操纵响应延迟。为实现此目标,我们提出一种两阶段的奖励级后门注入(RBI)策略,该策略首先对齐长响应格式,然后通过强化学习学习触发感知激活。此外,我们设计了在GUI环境中自然出现的现实弹窗作为触发器,提升了攻击的隐蔽性。在多个数据集和基线模型上的广泛实验表明,SlowBA能够显著增加响应长度和延迟,同时基本保持任务准确性。即使在低污染比例和多种防御设置下,该攻击依然有效。这些发现揭示了GUI代理中一个先前被忽视的安全漏洞,并强调了需要同时考虑动作正确性和响应效率的防御措施。代码可在 https://github.com/tu-tuing/SlowBA 找到。