As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code and data are available at https://github.com/zjunlp/PitfallsKnowledgeEditing.
翻译:随着大语言模型(LLMs)微调成本持续攀升,近期研究聚焦于开发编辑LLMs内隐知识的方法论。然而隐性风险仍如阴云笼罩——知识编辑是否会引发"蝴蝶效应"?当前尚不明确知识编辑是否可能引入带来潜在风险的副作用。本文开创性地探究了LLMs知识编辑的潜在陷阱。为此,我们构建了新型基准数据集并提出创新性评估指标。实验结果揭示两大关键问题:(1)知识冲突:编辑逻辑冲突的事实群组会放大LLMs固有的不一致性——这一维度被先前方法所忽视;(2)知识扭曲:以编辑事实知识为目标的参数调整会不可逆地扭曲LLMs的原生知识结构。实验数据生动表明,知识编辑可能无意中对LLMs投射出意料之外的负面影响阴影,亟需未来研究关注与应对。相关代码与数据均发布于https://github.com/zjunlp/PitfallsKnowledgeEditing。