Large Language Models (LLMs) face the "knowledge cutoff" challenge, where their frozen parametric memory prevents direct internalization of new information. While Supervised Fine-Tuning (SFT) is commonly used to update model knowledge, it often updates factual content without reliably improving the model's ability to use the newly incorporated information for question answering or decision-making. Reinforcement Learning (RL) is essential for acquiring reasoning skills; however, its high computational cost makes it impractical for efficient online adaptation. We empirically observe that the parameter updates induced by SFT and RL are nearly orthogonal. Based on this observation, we propose Parametric Skill Transfer (PaST), a framework that supports modular skill transfer for efficient and effective knowledge adaptation. By extracting a domain-agnostic Skill Vector from a source domain, we can linearly inject knowledge manipulation skills into a target model after it has undergone lightweight SFT on new data. Experiments on knowledge-incorporation QA (SQuAD, LooGLE) and agentic tool-use benchmarks (ToolBench) demonstrate the effectiveness of our method. On SQuAD, PaST outperforms the state-of-the-art self-editing SFT baseline by up to 9.9 points. PaST further scales to long-context QA on LooGLE with an 8.0-point absolute accuracy gain, and improves zero-shot ToolBench success rates by +10.3 points on average with consistent gains across tool categories, indicating strong scalability and cross-domain transferability of the Skill Vector.
翻译:大型语言模型面临"知识截止"挑战,其冻结的参数化记忆阻碍了新信息的直接内化。虽然监督微调常用于更新模型知识,但该方法通常仅更新事实内容,而无法可靠提升模型利用新信息进行问答或决策的能力。强化学习对于获取推理技能至关重要,然而其高昂的计算成本使得高效在线适应难以实现。我们通过实验观察到,监督微调与强化学习引发的参数更新近乎正交。基于此发现,我们提出参数化技能迁移框架,该框架支持模块化技能迁移以实现高效的知识适应。通过从源领域提取领域无关的技能向量,我们可以在目标模型完成新数据轻量级监督微调后,线性注入知识操作技能。在知识整合问答和智能体工具使用基准测试上的实验验证了本方法的有效性。在SQuAD数据集上,PaST较最先进的自主编辑监督微调基线提升高达9.9分。PaST进一步扩展至LooGLE长上下文问答任务,获得8.0个百分点的绝对准确率提升,并在ToolBench基准上实现平均+10.3个百分点的零样本成功率改进,且在不同工具类别中均保持稳定增益,这表明技能向量具备强大的可扩展性与跨领域迁移能力。