Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized into the policy, eventually leading to zero-skill inference. We argue this assumption is overly restrictive, since with limited parametric capacity and uneven marginal contribution across skills, the optimal active skill set is non-monotonic, task- and stage-dependent. In this work, we propose SLIM, a framework of dynamic Skill LIfecycle Management for agentic reinforcement learning (RL), which treats the active external skill set as a dynamic optimization variable jointly updated with policy learning. Specifically, SLIM estimates each active skill's marginal external contribution through leave-one-skill-out validation, then applies three lifecycle operations: retaining high-value skills, retiring skills whose contribution becomes negligible after sufficient exposure, and expanding the skill bank when persistent failures reveal missing capability coverage. Experiments show that SLIM outperforms the best baselines by an average of 7.1% points across ALFWorld and SearchQA. Results further indicate that policy learning and external skill retention are not mutually exclusive: some skills are absorbed into the policy, while others continue to provide external value, supporting SLIM as a more general paradigm for skill-based agentic RL.
翻译:大型语言模型智能体日益依赖外部技能来解决复杂任务,其中技能作为模块化单元,扩展了智能体超越参数化记忆本身的能力。现有方法假设外部技能要么作为持久性指导累积,要么内化到策略中,最终导致零技能推理。我们认为这一假设过于严格,因为受限于有限的参数容量和技能间不均衡的边际贡献,最优主动技能集是非单调的、与任务及阶段相关的。在本工作中,我们提出SLIM,一种面向智能体强化学习的动态技能生命周期管理框架,该框架将主动外部技能集视为与策略学习联合更新的动态优化变量。具体而言,SLIM通过留一技能验证估计每个主动技能的边际外部贡献,然后应用三种生命周期操作:保留高价值技能、在技能贡献经充分暴露后变得微不足道时将其淘汰,以及在持续失败暴露能力覆盖缺失时扩展技能库。实验表明,在ALFWorld和SearchQA上,SLIM平均超越最佳基线7.1个百分点。结果进一步表明,策略学习与外部技能保留并非互斥:部分技能被吸收进策略,而其他技能则持续提供外部价值,支持SLIM作为基于技能的智能体强化学习中一种更通用的范式。