Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer.
翻译:大型语言模型(LLMs)通过在海量语料库上进行预训练,在其参数中固有地编码了丰富的知识。尽管先前的研究已深入探讨了对这些参数进行操作以操控底层隐式知识(包括检测、编辑和合并),但关于这些知识在不同规模模型之间的可迁移性仍存在模糊的理解。在本文中,我们旨在通过参数视角实证研究从较大模型到较小模型的知识迁移。为实现此目标,我们采用基于敏感度的技术来提取并对齐不同LLMs之间的知识特定参数。此外,LoRA模块被用作将提取的知识注入较小模型的中间机制。在四个基准测试上的评估验证了我们提出方法的有效性。我们的结果强调了促进参数化知识迁移过程的关键因素,并突出了不同规模LLMs之间模型参数的可迁移性。项目网站:https://maszhongming.github.io/ParaKnowTransfer。