As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code and data are available at https://github.com/zjunlp/PitfallsKnowledgeEditing.
翻译:随着微调大型语言模型(LLMs)的成本持续上升,近期研究转向开发编辑LLMs内隐知识的方法。然而,知识编辑是否会引发蝴蝶效应?目前仍不清楚知识编辑是否可能带来具有潜在风险的副作用。本文率先探究LLMs知识编辑中存在的隐患。为此,我们引入新的基准数据集并提出创新的评估指标。研究结果凸显两大关键问题:(1)知识冲突:编辑逻辑冲突的事实组会放大LLMs固有矛盾——这一方面被先前方法所忽视。(2)知识扭曲:为编辑事实知识而调整参数,可能永久扭曲LLMs的先天知识结构。实验生动表明,知识编辑可能无意中给LLMs投下负面效应的阴影,这值得未来研究关注与应对。代码与数据公开于https://github.com/zjunlp/PitfallsKnowledgeEditing。