Fine-tuning large language models (LLMs) can cause them to lose their general capabilities. However, the intrinsic mechanisms behind such forgetting remain unexplored. In this paper, we begin by examining this phenomenon by focusing on knowledge understanding and instruction following, with the latter identified as the main contributor to forgetting during fine-tuning. Consequently, we propose the Instruction Vector (IV) framework to capture model representations highly related to specific instruction-following capabilities, thereby making it possible to understand model-intrinsic forgetting. Through the analysis of IV dynamics pre and post-training, we suggest that fine-tuning mostly adds specialized reasoning patterns instead of erasing previous skills, which may appear as forgetting. Building on this insight, we develop IV-guided training, which aims to preserve original computation graph, thereby mitigating catastrophic forgetting. Empirical tests on three benchmarks confirm the efficacy of this new approach, supporting the relationship between IVs and forgetting. Our code will be made available soon.
翻译:微调大语言模型(LLMs)可能导致其通用能力退化,然而这种遗忘现象的内在机制尚未得到充分探索。本文首先从知识理解与指令遵循两个维度考察该现象,发现指令遵循能力是微调过程中产生遗忘的主要因素。为此,我们提出指令向量(IV)框架,用以捕捉与特定指令遵循能力高度相关的模型表征,从而实现对模型内在遗忘机制的可解释性分析。通过对比训练前后指令向量的动态变化,我们发现微调过程主要引入特定推理模式而非直接消除原有技能,这种机制可能表现为性能遗忘现象。基于该发现,我们开发了指令向量引导训练方法,通过保持原始计算图结构来缓解灾难性遗忘。在三个基准测试上的实证结果验证了新方法的有效性,并证实了指令向量与遗忘现象之间的关联机制。相关代码即将开源。