Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. We release code and data at \url{https://github.com/maszhongming/ParaKnowTransfer}.
翻译:大型语言模型(LLMs)通过在海量语料库上的预训练,其参数中天然蕴含着丰富的知识。虽然先前的研究已深入探究针对这些参数的操作以操控底层隐式知识(包括检测、编辑和合并),但对于这些知识在不同规模模型间的可迁移性,目前仍缺乏清晰的理解。在本文中,我们旨在从参数视角实证研究知识从较大模型向较小模型的迁移。为实现此目标,我们采用基于敏感性的技术来提取并对齐不同LLMs之间的知识特定参数。此外,我们利用LoRA模块作为中间机制,将提取到的知识注入较小模型中。在四个基准上的评估验证了我们所提出方法的有效性。我们的研究结果凸显了参数化知识迁移过程中的关键因素,强调了不同规模LLMs之间模型参数的可迁移性。我们在 \url{https://github.com/maszhongming/ParaKnowTransfer} 上发布了代码和数据。