Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

An agent skill is a configuration package that equips an LLM-driven agent with a concrete capability, such as reading email, executing shell commands, or signing blockchain transactions. Each skill is a hybrid artifact-a structured half declares executable interfaces, while a prose half dictates when and how those interfaces fire-and the prose is reinterpreted probabilistically on every invocation. Conventional static analyzers parse the structured half but ignore the prose; LLM-based tools read the prose but cannot reproducibly prove that a tainted input reaches a high-impact sink. We present Semia, a static auditor for agent skills. Semia lifts each skill into the Skill Description Language (SDL), a Datalog fact base that captures LLM-triggered actions, prose-defined conditions, and human-in-the-loop checkpoints. Synthesizing a fact base that is both structurally sound and semantically faithful to the original prose is the central challenge; we address it with Constraint-Guided Representation Synthesis (CGRS), a propose-verify-evaluate loop that refines LLM candidates until convergence. Security properties (e.g., indirect injection, secret leakage, confused deputies, unguarded sinks, etc.) over an agent skill can then be reduced to Datalog reachability queries. We evaluate Semia on 13,728 real-world skills from public marketplaces. Semia renders all of them auditable and finds that more than half carry at least one critical semantic risk. On a stratified sample of 541 expert-labeled skills, Semia achieves 97.7% recall and an F1 of 90.6%, substantially outperforming signature-based scanners and LLM baselines.

翻译：智能体技能是一种配置包，赋予基于大语言模型的智能体具体能力，例如读取电子邮件、执行Shell命令或签署区块链交易。每项技能都是混合型产物——结构化部分声明可执行接口，散文部分规定这些接口在何时及如何触发，而散文部分在每次调用时都会以概率方式重新解释。传统静态分析器能解析结构化部分但忽略散文部分；基于大语言模型的工具虽能解读散文，却无法可重复地证明污染输入会到达高影响力汇点。本文提出Semia——智能体技能的静态审计器。Semia将每项技能提升至技能描述语言（SDL），这是一种数据日志事实库，可捕获由大语言模型触发的动作、散文定义的条件以及人类参与的检查点。核心挑战在于合成既结构完整又语义忠实于原始散文的事实库；我们通过约束引导表征合成（CGRS）解决此问题，这是一种“提议-验证-评估”循环，不断优化大语言模型候选直至收敛。智能体技能的安全属性（如间接注入、秘密泄露、困惑代理、未防护汇点等）可转化为数据日志可达性查询。我们基于公共市场的13,728个真实技能评估Semia。Semia使所有技能均可审计，并发现超过半数携带至少一项严重语义风险。在541个专家标注技能的分层样本中，Semia实现97.7%召回率和90.6% F1分数，显著优于基于签名的扫描器和大语言模型基线。