As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code and data are available at https://github.com/zjunlp/PitfallsKnowledgeEditing.
翻译:随着大语言模型(LLM)微调成本的持续上升,近期研究转向开发编辑LLM内隐知识的方法论。然而,知识编辑是否会引发蝴蝶效应?由于尚不确定知识编辑是否会产生潜在风险,这一阴霾久久未散。本文率先探索了LLM知识编辑的潜在陷阱。为此,我们引入新的基准数据集并提出创新的评估指标。实验结果凸显两大关键问题:(1)知识冲突:编辑逻辑矛盾的事实群组会放大LLM固有的不一致性——这一方面被以往方法所忽视。(2)知识扭曲:以编辑事实知识为目标修改参数,可能会不可逆地扭曲LLM的先天知识结构。实验结果生动表明,知识编辑可能无意间对LLM造成不可预见的负面后果,值得未来研究关注与努力。代码与数据已公开于https://github.com/zjunlp/PitfallsKnowledgeEditing。