Large language models (LLMs) acquire substantial world knowledge during pre-training, which is further shaped by post-training techniques such as supervised fine-tuning (SFT). However, the impact of SFT on a model's knowledge remains underexplored, limiting our ability to control knowledge change behavior in fine-tuned models. To address this gap, we evaluate closed-book question answering (CBQA) performance across five LLMs from the LLaMA-2 and LLaMA-3 families. Surprisingly, models fine-tuned on 1,920 samples perform up to 14% worse than those fine-tuned on only 240 samples. Furthermore, varying the level of knowledge mastery in the fine-tuning data leads to performance fluctuations of over 12%. To investigate these effects, we analyze model behavior at both the token and parameter levels. Our analysis reveals that up to 90% of parameter updates during SFT do not contribute to knowledge enhancement. Restoring these updates can improve performance on the CBQA task, depending on the characteristics of the fine-tuning data. These insights offer practical guidance for developing fine-tuning strategies that more effectively strengthen model knowledge.
翻译:大型语言模型(LLM)在预训练阶段获取了大量世界知识,这些知识随后通过监督微调(SFT)等后训练技术得到进一步塑造。然而,SFT对模型知识的影响仍未得到充分探索,这限制了我们控制微调模型中知识变化行为的能力。为填补这一空白,我们评估了来自LLaMA-2和LLaMA-3家族的五个LLM在闭卷问答(CBQA)任务上的表现。令人惊讶的是,在1,920个样本上微调的模型,其性能比仅在240个样本上微调的模型最多低14%。此外,改变微调数据中知识掌握的程度会导致超过12%的性能波动。为探究这些影响,我们从词元和参数两个层面分析了模型行为。我们的分析表明,SFT过程中高达90%的参数更新并未促进知识增强。根据微调数据的特性,恢复这些更新可以提升CBQA任务的性能。这些见解为开发能更有效增强模型知识的微调策略提供了实用指导。