This study investigates the impact of localized updates to large language models (LLMs), specifically in the context of knowledge editing - a task aimed at incorporating or modifying specific facts without altering broader model capabilities. We first show that across different post-training interventions like continuous pre-training, full fine-tuning and LORA-based fine-tuning, the Frobenius norm of the updated matrices always increases. This increasing norm is especially detrimental for localized knowledge editing, where only a subset of matrices are updated in a model . We reveal a consistent phenomenon across various editing techniques, including fine-tuning, hypernetwork-based approaches, and locate-and-edit methods: the norm of the updated matrix invariably increases with successive updates. Such growth disrupts model balance, particularly when isolated matrices are updated while the rest of the model remains static, leading to potential instability and degradation of downstream performance. Upon deeper investigations of the intermediate activation vectors, we find that the norm of internal activations decreases and is accompanied by shifts in the subspaces occupied by these activations, which shows that these activation vectors now occupy completely different regions in the representation space compared to the unedited model. With our paper, we highlight the technical challenges with continuous and localized sequential knowledge editing and their implications for maintaining model stability and utility.
翻译:本研究探讨了大型语言模型(LLM)局部化更新的影响,特别聚焦于知识编辑这一任务——该任务旨在在不改变模型整体能力的前提下,新增或修改特定事实。我们首先证明,在连续预训练、全量微调以及基于LORA的微调等不同的训练后干预方法中,更新后矩阵的Frobenius范数总是增加。这种范数增长对局部化知识编辑尤其不利,因为在该场景下仅更新模型中的部分矩阵。我们揭示了包括微调、基于超网络的方法以及定位-编辑方法在内的多种编辑技术中存在的一致现象:更新后矩阵的范数总是随着连续更新而增加。这种增长破坏了模型的平衡,尤其是在仅更新孤立矩阵而模型其余部分保持静态的情况下,可能导致模型不稳定及下游性能下降。通过对中间激活向量的深入研究,我们发现内部激活的范数会减小,并伴随着这些激活所占据子空间的变化,这表明与未编辑模型相比,这些激活向量现在占据了表示空间中完全不同的区域。通过本文,我们强调了连续且局部化的顺序知识编辑所面临的技术挑战,及其对维持模型稳定性和实用性的影响。