BackdoorAgent：基于大语言模型的智能体后门攻击统一框架 (BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents)

Large language model (LLM) agents execute tasks through multi-step workflows that combine planning, memory, and tool use. While this design enables autonomy, it also expands the attack surface for backdoor threats. Backdoor triggers injected into specific stages of an agent workflow can persist through multiple intermediate states and adversely influence downstream outputs. However, existing studies remain fragmented and typically analyze individual attack vectors in isolation, leaving the cross-stage interaction and propagation of backdoor triggers poorly understood from an agent-centric perspective. To fill this gap, we propose \textbf{BackdoorAgent}, a modular and stage-aware framework that provides a unified, agent-centric view of backdoor threats in LLM agents. BackdoorAgent structures the attack surface into three functional stages of agentic workflows, including \textbf{planning attacks}, \textbf{memory attacks}, and \textbf{tool-use attacks}, and instruments agent execution to enable systematic analysis of trigger activation and propagation across different stages. Building on this framework, we construct a standardized benchmark spanning four representative agent applications: \textbf{Agent QA}, \textbf{Agent Code}, \textbf{Agent Web}, and \textbf{Agent Drive}, covering both language-only and multimodal settings. Our empirical analysis shows that \textit{triggers implanted at a single stage can persist across multiple steps and propagate through intermediate states.} For instance, when using a GPT-based backbone, we observe trigger persistence in 43.58\% of planning attacks, 77.97\% of memory attacks, and 60.28\% of tool-stage attacks, highlighting the vulnerabilities of the agentic workflow itself to backdoor threats. To facilitate reproducibility and future research, our code and benchmark are publicly available at GitHub.

翻译：大语言模型（LLM）智能体通过结合规划、记忆与工具使用的多步骤工作流来执行任务。尽管这种设计赋予了自主能力，但也扩大了后门威胁的攻击面。注入智能体工作流特定阶段的后门触发器可跨越多个中间状态持续存在，并对下游输出产生不利影响。然而，现有研究仍较为零散，通常孤立分析单一攻击向量，未能从智能体中心视角深入理解后门触发器的跨阶段交互与传播机制。为填补这一空白，我们提出 **BackdoorAgent**——一个模块化且具备阶段感知能力的框架，为LLM智能体中的后门威胁提供统一的智能体中心视角。BackdoorAgent将攻击面结构化为智能体工作流的三个功能阶段，包括 **规划攻击**、**记忆攻击** 与 **工具使用攻击**，并通过插桩智能体执行过程，实现对不同阶段间触发器激活与传播的系统性分析。基于此框架，我们构建了涵盖四个代表性智能体应用的标准基准：**Agent QA**、**Agent Code**、**Agent Web** 与 **Agent Drive**，包含纯语言与多模态场景。实证分析表明，*植入单一阶段的触发器能够跨越多个步骤持续存在，并通过中间状态进行传播。* 例如，在使用GPT系列模型作为骨干时，我们观察到触发器在43.58%的规划攻击、77.97%的记忆攻击及60.28%的工具阶段攻击中持续存在，这凸显了智能体工作流本身对后门威胁的脆弱性。为促进可复现性与未来研究，我们的代码与基准已公开于GitHub平台。