Reusable skills let LLM agents package task-specific procedures, tool affordances, and execution guidance into modular building blocks. As skill ecosystems grow to tens of thousands of entries, exposing every skill at inference time becomes infeasible. This creates a skill-routing problem: given a user task, the system must identify relevant skills before downstream planning or execution. Existing agent stacks often rely on progressive disclosure, exposing only skill names and descriptions while hiding the full implementation body. We examine this design choice on a SkillsBench-derived benchmark with approximately 80K candidate skills, targeting the practically important setting of large skill registries with heavy overlap. Across representative sparse, dense, and reranking baselines on this setting, hiding the skill body causes a 31--44 percentage point drop in routing accuracy, showing that full skill text is a critical routing signal in this setting rather than a minor metadata refinement. Motivated by this finding, we present SkillRouter, a compact 1.2B full-text retrieve-and-rerank pipeline. SkillRouter achieves 74.0% Hit@1 on our benchmark -- the strongest average top-1 routing performance among the baselines we evaluate -- while using 13$\times$ fewer parameters and running 5.8$\times$ faster than the strongest base pipeline. The ranking gains further generalize to a supplementary benchmark independently constructed from three skill sources. In a complementary end-to-end study across four coding agents, routing gains transfer to improved task success, with larger gains for more capable agents.
翻译:可复用技能让LLM智能体将任务特定流程、工具能力和执行指南封装为模块化构建单元。当技能生态系统扩展至数万条记录时,在推理阶段暴露所有技能变得不可行,由此产生技能路由问题:给定用户任务,系统必须在下游规划或执行前识别相关技能。现有智能体栈常采用渐进式披露策略,仅暴露技能名称与描述而隐藏完整实现体。我们基于SkillsBench衍生基准(约8万候选技能)对此设计选择进行考察,聚焦存在大量重叠的大型技能注册表这一实际场景。在该场景下,对代表性稀疏、稠密和重排序基线进行对比发现,隐藏技能体导致路由准确率下降31-44个百分点,表明在此场景中完整技能文本是关键路由信号,而非次要元数据优化。基于此发现,我们提出SkillRouter——一个紧凑型12亿参数全文本检索-重排序流水线。该模型在基准测试中达到74.0%的Hit@1(评估基线中平均Top-1路由性能最强),同时参数比最强基线流水线少13倍,运行速度快5.8倍。排序增益可泛化至从三个技能源独立构建的补充基准。在针对四个编码智能体的互补端到端研究中,路由增益转化为任务成功率提升,且对更强智能体的改进幅度更大。