Clawdbot is a self-hosted, tool-using personal AI agent with a broad action space spanning local execution and web-mediated workflows, which raises heightened safety and security concerns under ambiguity and adversarial steering. We present a trajectory-centric evaluation of Clawdbot across six risk dimensions. Our test suite samples and lightly adapts scenarios from prior agent-safety benchmarks (including ATBench and LPS-Bench) and supplements them with hand-designed cases tailored to Clawdbot's tool surface. We log complete interaction trajectories (messages, actions, tool-call arguments/outputs) and assess safety using both an automated trajectory judge (AgentDoG-Qwen3-4B) and human review. Across 34 canonical cases, we find a non-uniform safety profile: performance is generally consistent on reliability-focused tasks, while most failures arise under underspecified intent, open-ended goals, or benign-seeming jailbreak prompts, where minor misinterpretations can escalate into higher-impact tool actions. We supplemented the overall results with representative case studies and summarized the commonalities of these cases, analyzing the security vulnerabilities and typical failure modes that Clawdbot is prone to trigger in practice.
翻译:Clawdbot是一个自托管、可使用工具的个人AI代理,其行动空间广泛,涵盖本地执行和网络介导的工作流,这在模糊性和对抗性引导下引发了更高的安全与安保担忧。我们提出了一个以轨迹为中心的评估方法,从六个风险维度对Clawdbot进行检验。我们的测试套件采样并轻度改编了先前智能体安全基准(包括ATBench和LPS-Bench)中的场景,并针对Clawdbot的工具接口补充了手工设计的案例。我们记录了完整的交互轨迹(消息、动作、工具调用参数/输出),并使用自动化轨迹评判器(AgentDoG-Qwen3-4B)和人工审核相结合的方式评估安全性。在34个典型案例中,我们发现其安全性表现并不均匀:在侧重于可靠性的任务上表现总体一致,而大多数失败发生在意图未明确指定、目标开放或看似良性的越狱提示下,其中微小的误解可能升级为更高影响的工具操作。我们在整体结果基础上补充了代表性案例研究,总结了这些案例的共性,分析了Clawdbot在实践中容易触发的安全漏洞和典型故障模式。