As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code is available at https://github.com/zjunlp/PitfallsKnowledgeEditing.
翻译:随着大语言模型微调成本持续攀升,近期研究转向开发编辑其中嵌入的隐式知识的方法。然而,知识编辑是否会引发蝴蝶效应?目前仍不清楚其是否可能引入带来潜在风险的副作用。本文首次系统探究大语言模型知识编辑的潜在陷阱。为此,我们构建了新型基准数据集并提出创新性评估指标。实验结果表明两大核心问题:(1)知识冲突:编辑逻辑矛盾的成组事实会放大语言模型固有的不一致性——这一方面在以往方法中被忽视;(2)知识扭曲:为编辑事实知识而修改参数会不可逆地扭曲模型先验知识结构。实验结果生动表明,知识编辑可能无意间对语言模型产生深远影响,亟需后续研究关注与应对。代码已开源至https://github.com/zjunlp/PitfallsKnowledgeEditing。