Self-evolving skill libraries face a silent failure mode we term \emph{library drift}: unbounded skill accumulation without outcome-driven lifecycle management causes retrieval degradation, false-positive injections, and performance stagnation. Recent evaluation confirms the symptom--LLM-authored skills deliver +0.0pp gain while human-curated ones deliver +16.2pp (SkillsBench)--yet the underlying mechanism has not been isolated. We provide (1) a reproducible trigger: ablations that isolate drift--one disables skill injection (flat floor, +0.002), one imposes premature retirement (active harm, $-$0.019); (2) trace-level diagnostics: an append-only evidence log with per-skill contribution scores, attribution verdicts, and router engagement metrics that make the failure visible before it reaches end-task scores; and (3) a verified fix: a minimal governance recipe (outcome-driven retirement + bounded active-cap + meta-skill authoring prior) that lifts held-out pass@1 from a 0.258 baseline to a late-window mean of 0.584 (rolling gain $+$0.328) on MBPP+ hard-100 over 100 rounds. Eight ablations decompose which governance mechanisms are load-bearing and which are subsumed, providing a concrete playbook for diagnosing library drift in any self-evolving agent.
翻译:[译摘要] 自演化技能库面临一种我们称之为“库漂移”的静默失效模式:缺乏基于结果的生命周期管理导致的无限技能积累,会造成检索退化、误阳性注入以及性能停滞。近期评估证实了该表象——由大语言模型撰写的技能带来0.0个百分点的性能提升,而人工筛选技能则带来16.2个百分点的提升(SkillsBench)——然而,其潜在机制尚未被分离出来。我们提供了:(1) 一个可复现的触发机制:通过消融实验隔离漂移——一项实验禁用技能注入(平坦下限,+0.002),另一项实验强制过早淘汰(活跃伤害,-0.019);(2) 轨迹级诊断:一个仅可追加的证据日志,包含每个技能的贡献评分、归属判定以及路由参与度指标,以便在问题影响最终任务分数前使其暴露;(3) 一个经验证的修复方案:一套最小化治理方案(基于结果的淘汰机制 + 有界活跃容量 + 元技能创作先验),在MBPP+ hard-100基准上经过100轮迭代,将保留样本的pass@1从0.258的基线提升至后期窗口均值0.584(滚增益+0.328)。八项消融实验分解了哪些治理机制是承载负荷的、哪些是已被包含的,为诊断任何自演化智能体中的库漂移提供了具体指南。