Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

Zijun Wang,Haoqin Tu,Letian Zhang,Hardy Chen,Juncheng Wu,Xiangyan Liu,Zhenlong Yuan,Tianyu Pang,Michael Qizhe Shieh,Fengze Liu,Zeyu Zheng,Huaxiu Yao,Yuyin Zhou,Cihang Xie

OpenClaw, the most widely deployed personal AI agent in early 2026, operates with full local system access and integrates with sensitive services such as Gmail, Stripe, and the filesystem. While these broad privileges enable high levels of automation and powerful personalization, they also expose a substantial attack surface that existing sandboxed evaluations fail to capture. To address this gap, we present the first real-world safety evaluation of OpenClaw and introduce the CIK taxonomy, which unifies an agent's persistent state into three dimensions, i.e., Capability, Identity, and Knowledge, for safety analysis. Our evaluations cover 12 attack scenarios on a live OpenClaw instance across four backbone models (Claude Sonnet 4.5, Opus 4.6, Gemini 3.1 Pro, and GPT-5.4). The results show that poisoning any single CIK dimension increases the average attack success rate from 24.6% to 64-74%, with even the most robust model exhibiting more than a threefold increase over its baseline vulnerability. We further assess three CIK-aligned defense strategies alongside a file-protection mechanism; however, the strongest defense still yields a 63.8% success rate under Capability-targeted attacks, while file protection blocks 97% of malicious injections but also prevents legitimate updates. Taken together, these findings show that the vulnerabilities are inherent to the agent architecture, necessitating more systematic safeguards to secure personal AI agents. Our project page is https://ucsc-vlaa.github.io/CIK-Bench.

翻译：OpenClaw是2026年初部署最广泛的个人AI智能体，拥有完整的本地系统访问权限，并集成了Gmail、Stripe和文件系统等敏感服务。虽然这些广泛权限实现了高度自动化和强大个性化，但也暴露了现有沙盒评估无法捕捉的实质性攻击面。为填补这一空白，我们首次对OpenClaw进行了真实世界安全评估，并提出了CIK分类法——将智能体的持久状态统一为三个维度（能力、身份和知识）进行安全分析。我们的评估涵盖在四个骨干模型（Claude Sonnet 4.5、Opus 4.6、Gemini 3.1 Pro和GPT-5.4）上运行的实时OpenClaw实例中的12种攻击场景。结果表明，污染任何单一CIK维度都会使平均攻击成功率从24.6%提升至64-74%，即使是最稳健的模型，其漏洞也较基线增加了三倍以上。我们进一步评估了三种与CIK对齐的防御策略以及一种文件保护机制；然而，在针对能力的攻击下，最强防御仍导致63.8%的成功率，而文件保护阻止了97%的恶意注入，但也阻碍了合法更新。综合来看，这些发现表明漏洞源于智能体架构本身，需要更系统性的保障措施来保护个人AI智能体。我们的项目页面为：https://ucsc-vlaa.github.io/CIK-Bench。