Recently Large Language Models (LLMs) have demonstrated their amazing text understanding and generation capabilities. However, even stronger LLMs may still learn incorrect knowledge from the training corpus, as well as some knowledge that is outdated over time. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which is based on parametric arithmetic to achieve forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can achieve a similar effect to subtracting the parameters of full fine-tuning, and sometimes even surpass it significantly.
翻译:近期,大语言模型展现了惊人的文本理解与生成能力。然而,即使更强的模型仍可能从训练语料中学习到错误知识,以及随时间推移而过时的知识。由于新旧知识存在冲突,直接使用包含新知识的数据进行二次微调可能难以有效更新知识。本文提出了一种名为F-Learning(遗忘先于学习)的新型微调范式,该范式基于参数算术实现旧知识遗忘与新知识学习。在两个公开数据集上的实验结果表明,我们提出的F-Learning能够显著提升全微调和LoRA微调的知识更新性能。此外,我们还发现通过减去LoRA参数实现遗忘旧知识,可达到与减去全微调参数相似的效果,有时甚至能显著超越后者。