The rapid proliferation of Claude agent skills has raised the central question of how to effectively leverage, manage, and scale the agent skill ecosystem. In this paper, we propose AgentSkillOS, the first principled framework for skill selection, orchestration, and ecosystem-level management. AgentSkillOS comprises two stages: (i) Manage Skills, which organizes skills into a capability tree via node-level recursive categorization for efficient discovery; and (ii) Solve Tasks, which retrieves, orchestrates, and executes multiple skills through DAG-based pipelines. To evaluate the agent's ability to invoke skills, we construct a benchmark of 30 artifact-rich tasks across five categories: data computation, document creation, motion video, visual design, and web interaction. We assess the quality of task outputs using LLM-based pairwise evaluation, and the results are aggregated via a Bradley-Terry model to produce unified quality scores. Experiments across three skill ecosystem scales (200 to 200K skills) show that tree-based retrieval effectively approximates oracle skill selection, and that DAG-based orchestration substantially outperforms native flat invocation even when given the identical skill set. Our findings confirm that structured composition is the key to unlocking skill potential. Our GitHub repository is available at:https://github.com/ynulihao/AgentSkillOS.
翻译:Claude智能体技能的快速涌现提出了一个核心问题:如何有效利用、管理和扩展智能体技能生态系统。本文提出AgentSkillOS,这是首个面向技能选择、编排及生态系统级管理的原则性框架。AgentSkillOS包含两个阶段:(i) 技能管理阶段,通过节点级递归分类将技能组织为能力树以实现高效发现;(ii) 任务解决阶段,通过基于有向无环图(DAG)的流程对多个技能进行检索、编排与执行。为评估智能体调用技能的能力,我们构建了一个包含30项丰富产出的任务基准测试集,涵盖五大类别:数据计算、文档创建、动态视频、视觉设计与网页交互。我们采用基于大语言模型的成对评估方法对任务输出质量进行评判,并通过Bradley-Terry模型聚合结果以生成统一质量评分。在三种技能生态系统规模(200至20万项技能)上的实验表明:基于树的检索方法能有效逼近理想技能选择效果,且即使在给定相同技能集的情况下,基于DAG的编排方式也显著优于原生扁平化调用方式。我们的研究证实结构化组合是释放技能潜力的关键。项目GitHub仓库地址为:https://github.com/ynulihao/AgentSkillOS。