Can Knowledge Editing Really Correct Hallucinations?

from arxiv, The first two authors contributed equally to this work. The main paper is 10 pages long, with 35 pages total. The code, results, dataset, and additional resources are available on the project website: https://llm-editing.github.io/

Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual information in generated content, despite their superior capacities across tasks. Meanwhile, knowledge editing has been developed as a new popular paradigm to correct the erroneous factual knowledge encoded in LLMs with the advantage of avoiding retraining from scratch. However, one common issue of existing evaluation datasets for knowledge editing is that they do not ensure LLMs actually generate hallucinated answers to the evaluation questions before editing. When LLMs are evaluated on such datasets after being edited by different techniques, it is hard to directly adopt the performance to assess the effectiveness of different knowledge editing methods in correcting hallucinations. Thus, the fundamental question remains insufficiently validated: Can knowledge editing really correct hallucinations in LLMs? We proposed HalluEditBench to holistically benchmark knowledge editing methods in correcting real-world hallucinations. First, we rigorously construct a massive hallucination dataset with 9 domains, 26 topics and more than 6,000 hallucinations. Then, we assess the performance of knowledge editing methods in a holistic way on five dimensions including Efficacy, Generalization, Portability, Locality, and Robustness. Through HalluEditBench, we have provided new insights into the potentials and limitations of different knowledge editing methods in correcting hallucinations, which could inspire future improvements and facilitate the progress in the field of knowledge editing.

翻译：大型语言模型（LLMs）尽管在各种任务中展现出卓越能力，却饱受幻觉问题困扰，即生成内容中包含非事实信息。与此同时，知识编辑作为一种新兴流行范式应运而生，其优势在于无需从头开始重新训练模型即可修正LLMs中编码的错误事实知识。然而，现有知识编辑评估数据集普遍存在一个问题：它们未能确保LLMs在编辑前确实对评估问题生成幻觉答案。当采用不同技术编辑后的LLMs在此类数据集上进行评估时，很难直接依据性能指标来评判不同知识编辑方法在纠正幻觉方面的有效性。因此，一个根本性问题仍未得到充分验证：知识编辑究竟能否真正纠正LLMs中的幻觉？我们提出HalluEditBench，以系统性评估知识编辑方法在纠正现实世界幻觉方面的表现。首先，我们严谨构建了包含9个领域、26个主题、超过6000条幻觉的大规模幻觉数据集。随后，我们从效能、泛化性、可移植性、局部性和鲁棒性五个维度，全面评估知识编辑方法的性能。通过HalluEditBench，我们为不同知识编辑方法在纠正幻觉方面的潜力与局限提供了新见解，这将启发未来改进方向，并推动知识编辑领域的进步。