Graphical user interface (GUI) agents powered by multimodal large language models (MLLMs) have shown greater promise for human-interaction. However, due to the high fine-tuning cost, users often rely on open-source GUI agents or APIs offered by AI providers, which introduces a critical but underexplored supply chain threat: backdoor attacks. In this work, we first unveil that MLLM-powered GUI agents naturally expose multiple interaction-level triggers, such as historical steps, environment states, and task progress. Based on this observation, we introduce AgentGhost, an effective and stealthy framework for red-teaming backdoor attacks. Specifically, we first construct composite triggers by combining goal and interaction levels, allowing GUI agents to unintentionally activate backdoors while ensuring task utility. Then, we formulate backdoor injection as a Min-Max optimization problem that uses supervised contrastive learning to maximize the feature difference across sample classes at the representation space, improving flexibility of the backdoor. Meanwhile, it adopts supervised fine-tuning to minimize the discrepancy between backdoor and clean behavior generation, enhancing effectiveness and utility. Extensive evaluations of various agent models in two established mobile benchmarks show that AgentGhost is effective and generic, with attack accuracy that reaches 99.7\% on three attack objectives, and shows stealthiness with only 1\% utility degradation. Furthermore, we tailor a defense method against AgentGhost that reduces the attack accuracy to 22.1\%. Our code is available at \texttt{anonymous}.
翻译:图形用户界面(GUI)代理凭借多模态大语言模型(MLLM)的赋能,在人机交互中展现出巨大潜力。然而,由于微调成本高昂,用户常依赖开源GUI代理或AI提供商提供的API,这引入了一种关键但尚未充分探索的供应链威胁:后门攻击。本文首次揭示,MLLM驱动的GUI代理天然暴露了多种交互层面的触发条件,例如历史步骤、环境状态和任务进度。基于此观察,我们提出AgentGhost——一个针对后门攻击的有效且隐蔽的红队框架。具体而言,我们首先通过结合目标层和交互层构建复合触发器,使GUI代理在确保任务效用的同时无意识地激活后门。随后,我们将后门注入建模为一个最小-最大优化问题,利用监督对比学习在表征空间中最大化样本类间的特征差异,以提升后门的灵活性。同时,采用监督微调最小化后门与干净行为生成之间的差异,增强有效性和效用性。在两个既定移动基准上对多种代理模型的广泛评估表明,AgentGhost有效且通用,在三个攻击目标上攻击准确率达99.7%,且仅以1%的效用降级实现隐蔽性。此外,我们针对AgentGhost定制了一种防御方法,可将攻击准确率降至22.1%。我们的代码已发布于 \texttt{anonymous}。