Autonomous AI agents extend large language models into full runtime systems that load skills, ingest external content, maintain memory, plan multi-step actions, and invoke privileged tools. In such systems, security failures rarely remain confined to a single interface; instead, they can propagate across initialization, input processing, memory, decision-making, and execution, often becoming apparent only when harmful effects materialize in the environment. This paper presents AgentWard, a lifecycle-oriented, defense-in-depth architecture that systematically organizes protection across these five stages. AgentWard integrates stage-specific, heterogeneous controls with cross-layer coordination, enabling threats to be intercepted along their propagation paths while safeguarding critical assets. We detail the design rationale and architecture of five coordinated protection layers, and implement a plugin-native prototype on OpenClaw to demonstrate practical feasibility. This perspective provides a concrete blueprint for structuring runtime security controls, managing trust propagation, and enforcing execution containment in autonomous AI agents. Our code is available at https://github.com/FIND-Lab/AgentWard .
翻译:自主AI智能体将大型语言模型扩展为完整的运行时系统,可加载技能、摄取外部内容、维护记忆、规划多步骤操作并调用特权工具。在此类系统中,安全故障极少局限于单一接口,反而能在初始化、输入处理、记忆模块、决策制定和执行等环节间传播,往往仅在有害影响于环境中显现时才得以察觉。本文提出AgentWard——一种面向生命周期的纵深防御架构,系统性地组织上述五个阶段的防护措施。AgentWard将特定阶段异质控制机制与跨层协同相结合,使威胁能够在沿传播路径扩散时被拦截,同时保护关键资产。我们详述五项协同防护层的设计原理与架构,并在OpenClaw上实现插件原生原型以验证实践可行性。该视角为自主AI智能体构建运行时安全控制、管理信任传播及执行约束提供了具体蓝图。代码已在https://github.com/FIND-Lab/AgentWard 开源。