Fine-tuning a language model often results in a degradation of its existing performance on other tasks, due to a shift in the model parameters; this phenomenon is often referred to as (catastrophic) forgetting. We are interested in mitigating this, in settings where we only have access to the model weights but no access to its training data/recipe. A natural approach is to penalize the KL divergence between the original model and the new one. Our main realization is that a simple process - which we term context-free generation - allows for an approximate unbiased estimation of this KL divergence. We show that augmenting a fine-tuning dataset with context-free generations mitigates forgetting, in two settings: (a) preserving the zero-shot performance of pretrained-only models, and (b) preserving the reasoning performance of thinking models. We show that contextual synthetic data, and even a portion of the pretraining data, are less effective. We also investigate the effect of choices like generation temperature, data ratios etc. We present our results for OLMo-1B for pretrained-only setting and R1-Distill-Llama-8B for the reasoning setting.
翻译:微调语言模型常导致其参数偏移,从而降低模型在其他任务上的现有性能;这一现象通常被称为(灾难性)遗忘。我们致力于在仅能获取模型权重而无法访问其训练数据/配方的情况下缓解此问题。一种自然的方法是惩罚原始模型与新模型之间的KL散度。我们的核心发现是,一个简单的过程——我们称之为上下文无关生成——能够对此KL散度进行近似无偏估计。我们证明,在两种场景下,通过向微调数据集添加上下文无关生成数据可有效缓解遗忘:(a)保持纯预训练模型的零样本性能,以及(b)保持思维模型的推理性能。研究表明,上下文相关的合成数据乃至部分预训练数据的效果均不及此方法。我们还探究了生成温度、数据比例等参数选择的影响。我们在纯预训练场景中使用OLMo-1B模型,在推理场景中使用R1-Distill-Llama-8B模型呈现实验结果。