Large language models (LLMs) store vast amounts of knowledge, which often requires updates to correct factual errors, incorporate newly acquired information, or adapt model behavior. Model editing methods have emerged as efficient solutions for such updates, offering localized and precise knowledge modification at significantly lower computational cost than continual training. In parallel, LLMs are frequently fine-tuned for a wide range of downstream tasks. However, the effect of fine-tuning on previously edited knowledge remains poorly understood. In this work, we systematically investigate how different fine-tuning objectives interact with various model editing techniques. Our findings show that edited knowledge is substantially more susceptible to forgetting during fine-tuning than intrinsic knowledge acquired through pre-training. This analysis highlights a key limitation of current editing approaches and suggests that evaluating edit robustness under downstream fine-tuning is critical for their practical deployment. We further find that knowledge retention can be significantly improved by either augmenting edit knowledge with paraphrases or by freezing layers associated with edited content in fine-tuning stage, offering insight for developing more robust editing algorithms.
翻译:大型语言模型(LLM)存储着海量知识,这些知识通常需要更新以修正事实错误、纳入新获取的信息或调整模型行为。模型编辑方法已成为此类更新的高效解决方案,相较于持续训练,它能以显著更低的计算成本实现局部且精确的知识修改。与此同时,LLM 经常针对各种下游任务进行微调。然而,微调对先前编辑知识的影响仍不甚明了。在本工作中,我们系统地研究了不同的微调目标如何与多种模型编辑技术相互作用。我们的研究结果表明,相较于通过预训练获得的内在知识,编辑后的知识在微调过程中明显更容易被遗忘。这一分析揭示了当前编辑方法的一个关键局限,并表明在下游微调条件下评估编辑的鲁棒性对其实际部署至关重要。我们进一步发现,通过使用释义增强编辑知识,或在微调阶段冻结与编辑内容相关的层,可以显著改善知识保留,这为开发更鲁棒的编辑算法提供了思路。