AI agent frameworks connecting large language model (LLM) reasoning to host execution surfaces--shell, filesystem, containers, and messaging--introduce security challenges structurally distinct from conventional software. We present a systematic taxonomy of 190 advisories filed against OpenClaw, an open-source AI agent runtime, organized by architectural layer and trust-violation type. Vulnerabilities cluster along two orthogonal axes: (1) the system axis, reflecting the architectural layer (exec policy, gateway, channel, sandbox, browser, plugin, agent/prompt); and (2) the attack axis, reflecting adversarial techniques (identity spoofing, policy bypass, cross-layer composition, prompt injection, supply-chain escalation). Patch-differential evidence yields three principal findings. First, three Moderate- or High-severity advisories in the Gateway and Node-Host subsystems compose into a complete unauthenticated remote code execution (RCE) path--spanning delivery, exploitation, and command-and-control--from an LLM tool call to the host process. Second, the exec allowlist, the primary command-filtering mechanism, relies on a closed-world assumption that command identity is recoverable via lexical parsing. This is invalidated by shell line continuation, busybox multiplexing, and GNU option abbreviation. Third, a malicious skill distributed via the plugin channel executed a two-stage dropper within the LLM context, bypassing the exec pipeline and demonstrating that the skill distribution surface lacks runtime policy enforcement. The dominant structural weakness is per-layer trust enforcement rather than unified policy boundaries, making cross-layer attacks resilient to local remediation.
翻译:将大语言模型推理与主机执行面(如shell、文件系统、容器和消息传递)相连接的AI代理框架,引入了与传统软件在结构上截然不同的安全挑战。我们针对开源AI代理运行时OpenClaw提交的190条安全公告,基于架构层和信任违规类型进行了系统分类。漏洞沿两个正交维度聚集:(1)系统维度,反映架构层(执行策略、网关、通道、沙箱、浏览器、插件、代理/提示);以及(2)攻击维度,反映对抗性技术(身份欺骗、策略绕过、跨层组合、提示注入、供应链升级)。补丁差分证据得出三个主要发现。首先,网关和节点-主机子系统中的三个中等或高严重性公告组合成一条完整的远程代码执行路径(涵盖交付、利用和命令控制),从大语言模型工具调用延伸至主机进程。其次,作为主要命令过滤机制的执行允许列表,依赖于命令身份可通过词法解析恢复的封闭世界假设。该假设因shell行延续符、busybox多路复用和GNU选项缩写而失效。第三,通过插件通道分发的恶意技能在大语言模型上下文中执行了二阶段木马加载器,绕过了执行管线,表明技能分发面缺乏运行时策略执行。主要结构性弱点在于各层独立的信任执行机制而非统一的策略边界,这使得跨层攻击对本地修复具有抵抗力。