As the cost associated with fine-tuning Large Language Models (LLMs) continues to rise, recent research efforts have pivoted towards developing methodologies to edit implicit knowledge embedded within LLMs. Yet, there's still a dark cloud lingering overhead -- will knowledge editing trigger butterfly effect? since it is still unclear whether knowledge editing might introduce side effects that pose potential risks or not. This paper pioneers the investigation into the potential pitfalls associated with knowledge editing for LLMs. To achieve this, we introduce new benchmark datasets and propose innovative evaluation metrics. Our results underline two pivotal concerns: (1) Knowledge Conflict: Editing groups of facts that logically clash can magnify the inherent inconsistencies in LLMs-a facet neglected by previous methods. (2) Knowledge Distortion: Altering parameters with the aim of editing factual knowledge can irrevocably warp the innate knowledge structure of LLMs. Experimental results vividly demonstrate that knowledge editing might inadvertently cast a shadow of unintended consequences on LLMs, which warrant attention and efforts for future works. Code will be released at https://github.com/zjunlp/PitfallsKnowledgeEditing.
翻译:随着大语言模型微调成本持续攀升,近年研究重心转向开发编辑大语言模型内隐知识的方法论。然而,知编是否会产生蝴蝶效应?由于知识编辑是否可能引发潜在风险仍属未知,这个隐忧始终挥之不去。本文开创性地系统探究大语言模型知识编辑存在的隐患,为此构建全新基准数据集并提出创新评估指标。研究结果揭示两大关键问题:(1) 知识冲突:编辑逻辑相悖的事实集合会放大LLMs固有的不一致性,而现有方法对此缺乏考量;(2) 知识扭曲:以编辑事实知识为目标的参数调整,可能不可逆转地扭曲LLMs的固有知识结构。实验数据清晰表明,知识编辑可能对LLMs带来始料未及的负面影响,亟需未来研究予以关注和应对。相关代码将发布于https://github.com/zjunlp/PitfallsKnowledgeEditing。