Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Equipping Large Language Model (LLM) agents with domain-specific skills is critical for tackling complex tasks. Yet, manual authoring creates a severe scalability bottleneck. Conversely, automated skill generation often yields fragile or fragmented results because it either relies on shallow parametric knowledge or sequentially overfits to non-generalizable trajectory-local lessons. To overcome this, we introduce Trace2Skill, a framework that mirrors how human experts author skills: by holistically analyzing broad execution experience before distilling it into a single, comprehensive guide. Instead of reacting sequentially to individual trajectories, Trace2Skill dispatches a parallel fleet of sub-agents to analyze a diverse pool of executions. It extracts trajectory-specific lessons and hierarchically consolidates them into a unified, conflict-free skill directory via inductive reasoning. Trace2Skill supports both deepening existing human-written skills and creating new ones from scratch. Experiments in challenging domains, such as spreadsheet, VisionQA and math reasoning, show that Trace2Skill significantly improves upon strong baselines, including Anthropic's official xlsx skills. Crucially, this trajectory-grounded evolution does not merely memorize task instances or model-specific quirks: evolved skills transfer across LLM scales and generalize to OOD settings. For example, skills evolved by Qwen3.5-35B on its own trajectories improved a Qwen3.5-122B agent by up to 57.65 absolute percentage points on WikiTableQuestions. Ultimately, our results demonstrate that complex agent experience can be packaged into highly transferable, declarative skills -- requiring no parameter updates, no external retrieval modules, and utilizing open-source models as small as 35B parameters.

翻译：为大型语言模型智能体配备领域专用技能是解决复杂任务的关键，然而人工编写技能会造成严重的可扩展性瓶颈。相反，自动化技能生成往往产生脆弱或碎片化的结果，因其要么依赖浅层的参数化知识，要么顺序性地过拟合于不可泛化的轨迹局部经验。为克服这一难题，我们提出Trace2Skill框架——它模拟人类专家编写技能的方式：通过整体分析广泛的执行经验，再将其蒸馏为一份全面的指导手册。Trace2Skill并非顺序性地对单条轨迹做出反应，而是派遣一组并行子智能体分析多样化的执行池，提取轨迹特定经验，并通过归纳推理将其层级化整合为统一且无冲突的技能目录。该框架既支持深化现有手工编写的技能，也支持从零创建新技能。在电子表格、视觉问答和数学推理等具有挑战性的领域实验中，Trace2Skill显著优于包括Anthropic官方xlsx技能在内的强基线方法。关键的是，这种基于轨迹的演化并非简单记忆任务实例或模型特有偏差：演化后的技能可跨LLM规模迁移，并泛化至分布外场景。例如，由Qwen3.5-35B在其自身轨迹上演化出的技能，使Qwen3.5-122B智能体在WikiTableQuestions上的绝对性能提升高达57.65个百分点。最终，我们的结果表明，复杂的智能体经验可被封装为高度可迁移的声明式技能——无需参数更新、无需外部检索模块，且可利用小至35B参数的开源模型。