Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.
翻译:智能体技能——一种结构化的程序性知识与可执行资源包,智能体在推理时动态加载——已成为增强大语言模型智能体的可靠机制。然而,推理时技能增强存在根本性局限:检索噪声会引入无关指导,注入的技能内容导致显著的令牌开销,且模型从未真正获取其所遵循的知识。我们提出疑问:技能能否被内化至模型参数中,从而在无需任何运行时技能检索的情况下实现零样本自主行为?为此,我们引入SKILL0——一种专为技能内化设计的上下文强化学习框架。SKILL0采用训练时课程:从完整技能上下文开始,逐步撤回技能。技能按类别离线分组,并与交互历史结合渲染为紧凑视觉上下文,教模型掌握工具调用与多轮任务完成。动态课程方法随后评估每个技能文件的在策略有用性,在线性衰减预算内仅保留当前策略仍可获益的技能,直至智能体在完全零样本场景下运行。广泛的智能体实验表明,SKILL0相比标准强化学习基线实现显著提升(ALFWorld提升+9.7%,Search-QA提升+6.6%),同时保持每步少于0.5k令牌的高效上下文。我们的代码开源在https://github.com/ZJU-REAL/SkillZero。