SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

Agentic systems increasingly rely on reusable procedural capabilities, \textit{a.k.a., agentic skills}, to execute long-horizon workflows reliably. These capabilities are callable modules that package procedural knowledge with explicit applicability conditions, execution policies, termination criteria, and reusable interfaces. Unlike one-off plans or atomic tool calls, skills operate (and often do well) across tasks. This paper maps the skill layer across the full lifecycle (discovery, practice, distillation, storage, composition, evaluation, and update) and introduces two complementary taxonomies. The first is a system-level set of \textbf{seven design patterns} capturing how skills are packaged and executed in practice, from metadata-driven progressive disclosure and executable code skills to self-evolving libraries and marketplace distribution. The second is an orthogonal \textbf{representation $\times$ scope} taxonomy describing what skills \emph{are} (natural language, code, policy, hybrid) and what environments they operate over (web, OS, software engineering, robotics). We analyze the security and governance implications of skill-based agents, covering supply-chain risks, prompt injection via skill payloads, and trust-tiered execution, grounded by a case study of the ClawHavoc campaign in which nearly 1{,}200 malicious skills infiltrated a major agent marketplace, exfiltrating API keys, cryptocurrency wallets, and browser credentials at scale. We further survey deterministic evaluation approaches, anchored by recent benchmark evidence that curated skills can substantially improve agent success rates while self-generated skills may degrade them. We conclude with open challenges toward robust, verifiable, and certifiable skills for real-world autonomous agents.

翻译：智能体系统日益依赖可复用的程序化能力，即智能体技能，以可靠地执行长时程工作流。这些能力是可调用模块，将程序性知识与明确的适用条件、执行策略、终止标准及可复用接口封装在一起。与一次性规划或原子工具调用不同，技能可跨任务运行（且通常表现良好）。本文从全生命周期（发现、实践、提炼、存储、组合、评估与更新）维度梳理技能层，并引入两种互补的分类体系。其一是系统级的**七种设计模式**，涵盖实践中技能封装与执行的方式，包括元数据驱动的渐进式披露、可执行代码技能、自进化库及市场分发等。其二是一个正交的**表示形式×作用域**分类法，描述技能的本质（自然语言、代码、策略、混合形式）及其运行环境（网络、操作系统、软件工程、机器人学）。我们分析了基于技能的智能体在安全与治理方面的影响，涵盖供应链风险、通过技能载荷的提示注入及信任分级执行等问题，并以ClawHavoc攻击活动为案例进行论证——该活动中近1,200个恶意技能渗透至主流智能体市场，大规模窃取API密钥、加密货币钱包及浏览器凭证。我们进一步系统梳理了确定性评估方法，结合近期基准测试证据指出：经精心设计的技能可显著提升智能体成功率，而自生成技能可能降低其性能。最后，我们针对现实世界自主智能体所需具备的鲁棒、可验证、可认证技能提出了开放挑战。