Clawed and Dangerous: Can We Trust Open Agentic Systems?

Open agentic systems combine LLM-based planning with external capabilities, persistent memory, and privileged execution. They are used in coding assistants, browser copilots, and enterprise automation. OpenClaw is a visible instance of this broader class. Without much attention yet, their security challenge is fundamentally different from that of traditional software that relies on predictable execution and well-defined control flow. In open agentic systems, everything is ''probabilistic'': plans are generated at runtime, key decisions may be shaped by untrusted natural-language inputs and tool outputs, execution unfolds in uncertain environments, and actions are taken under authority delegated by human users. The central challenge is therefore not merely robustness against individual attacks, but the governance of agentic behavior under persistent uncertainty. This paper systematizes the area through a software engineering lens. We introduce a six-dimensional analytical taxonomy and synthesize 50 papers spanning attacks, benchmarks, defenses, audits, and adjacent engineering foundations. From this synthesis, we derive a reference doctrine for secure-by-construction agent platforms, together with an evaluation scorecard for assessing platform security posture. Our review shows that the literature is relatively mature in attack characterization and benchmark construction, but remains weak in deployment controls, operational governance, persistent-memory integrity, and capability revocation. These gaps define a concrete engineering agenda for building agent ecosystems that are governable, auditable, and resilient under compromise.

翻译：开放智能体系统将基于大语言模型的规划能力与外部功能、持久化记忆及特权执行相结合，广泛应用于编程助手、浏览器协控及企业自动化领域。OpenClaw作为该类系统的典型实例，其安全挑战与传统依赖可预测执行与确定控制流的软件存在根本性差异——这一关键问题尚未得到充分关注。在此类系统中，所有环节均具有"概率性"特征：运行时刻动态生成规划方案，关键决策可能受到不可信自然语言输入与工具输出的影响，执行过程面临环境不确定性，且所有操作均在用户授权下进行。因此核心挑战不仅在于抵御单一攻击的鲁棒性，更在于持久不确定性条件下对智能体行为的治理能力。本文通过软件工程视角对该领域进行系统梳理：首先提出六维度分析分类法，综合评述50篇涵盖攻击方法、基准测试、防御机制、审计手段及相邻工程基础的文献。基于此综合研究，我们推导出安全由设计构建的智能体平台参考原则，并配套开发平台安全态势评估记分卡。研究显示，现有文献在攻击特征刻画与基准测试构建方面已相对成熟，但在部署控制、运营治理、持久化记忆完整性保障及能力撤销机制方面仍存在明显短板。这些差距为构建可治理、可审计且具抗毁弹性的智能体生态系统界定了具体的工程实施路线图。