Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct RippleEdits, a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on RippleEdits, showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.
翻译:现代语言模型掌握了大量事实知识。然而,部分事实可能被错误归纳,或随时间推移变得过时,导致生成结果出现事实性错误。为此,研究者开发了多种编辑方法,可更新模型编码的事实。现有评估主要聚焦于:单个事实是否成功注入,以及模型对其他主体的预测是否保持不变。本文认为此类评估存在局限性——注入一个事实(如"杰克·德普是约翰尼·德普的儿子")会引发"涟漪效应",即模型需要更新衍生事实(如"杰克·德普是莉莉-罗丝·德普的兄弟姐妹")。为解决该问题,我们提出一组考虑编辑对相关事实影响的评价标准。基于这些标准,我们构建了包含5000个事实编辑的诊断基准RippleEdits,覆盖多种涟漪效应类型。在RippleEdits上评估主流编辑方法后,我们发现当前方法无法在模型知识中实现一致的变更。此外,一种简单的上下文编辑基线在我们的基准测试中取得了最佳分数,这为模型编辑领域提供了有前景的研究方向。