Long-horizon LLM agents generate traces that could become reusable experience, but raw trajectories are noisy, local, and hard to govern. Agent Skills offer a structured artifact for combining procedural guidance, executable resources, and applicability boundaries. Yet open skill ecosystems contain redundant, uneven, environment-sensitive artifacts, and indiscriminate updates can pollute future context. We present SkillsVote, a lifecycle-governance framework for Agent Skills across collection, recommendation, attribution, and evolution. SkillsVote profiles a million-scale open source corpus for environment requirements, quality, and verifiability, and synthesizes tasks for verifiable skills. Before execution, it performs agentic library search over structured skill folders to expose instructional context. After execution, it decomposes trajectories into skill-linked subtasks, attributes outcomes to skill-guided execution, agent exploration, environment, and result signals, and admits only successful reusable discoveries to evidence-gated updates. Experiments on Terminal-Bench 2.0 and SWE-Bench Pro show that SkillsVote improves agent performance on challenging agentic coding benchmarks. The gains arise from two complementary pathways: online evolution over task streams at test time and offline transfer via frozen libraries built from either historical trajectories or curated open source skills.
翻译:长周期大语言模型智能体生成的轨迹可转化为可复用的经验,但原始轨迹存在噪声大、局部性强且难以治理的问题。智能体技能提供了一种融合过程指导、可执行资源与适用性边界的结构化工件。然而,开放技能生态系统中充斥着冗余、质量参差且敏感于环境的工件,无差异的更新会污染未来上下文。我们提出SkillsVote——面向智能体技能全生命周期(涵盖收集、推荐、归因与演化)的治理框架。SkillsVote对百万级开源语料库进行环境需求、质量与可验证性分析,并合成为可验证技能设计的任务。在执行前,系统通过结构化技能文件夹执行智能体库搜索以暴露指令上下文;执行后,将轨迹分解为技能关联的子任务,将结果归因于技能引导的执行、智能体探索、环境与结果信号,仅允许成功可复用的发现进入证据门控更新。在Terminal-Bench 2.0与SWE-Bench Pro上的实验表明,SkillsVote提升了智能体在挑战性编码基准上的性能。性能提升源于两条互补路径:测试时任务流上的在线演化,以及通过冻结库(基于历史轨迹或精选开源技能构建)实现的离线迁移。