Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19--20 out of 20 malicious skills across rounds.
翻译:代理技能使大语言模型代理能够复用指令、资源、工具和工作流,但也为恶意行为提供了新的隐藏空间。一项技能可能在文档或代码层面看似无害,但仅在特定用户请求、本地资产、持久化状态或多步骤工具交互被调用时才会产生危害。这使得纯静态审查方法变得脆弱。本文提出运行时代理技能审计(RSA)方法,这是一种动态分析方法,通过询问技能中介代理在目标化运行时条件下实际执行的操作来审计技能。RSA并非用相同通用任务测试所有技能,而是对风险相关接口进行特征分析,构建执行所需上下文,并根据追踪证据分配安全标签。我们在OpenClaw框架上实现RSA,并在100项技能上与代表性静态基线方法进行对比评估。RSA实现了90.0%的准确率,真阳性率为88.0%,假阳性率为8.0%,相较于最优静态基线方法准确率提升13个百分点。面对自演化攻击,静态检测器在一至两轮攻击后即失效,而RSA在各轮攻击中始终能检测出20个恶意技能中的19-20个。