Modern language models capture a large body of factual knowledge. However, some facts can be incorrectly induced or become obsolete over time, resulting in factually incorrect generations. This has led to the development of various editing methods that allow updating facts encoded by the model. Evaluation of these methods has primarily focused on testing whether an individual fact has been successfully injected, and if similar predictions for other subjects have not changed. Here we argue that such evaluation is limited, since injecting one fact (e.g. ``Jack Depp is the son of Johnny Depp'') introduces a ``ripple effect'' in the form of additional facts that the model needs to update (e.g.``Jack Depp is the sibling of Lily-Rose Depp''). To address this issue, we propose a novel set of evaluation criteria that consider the implications of an edit on related facts. Using these criteria, we then construct \ripple{}, a diagnostic benchmark of 5K factual edits, capturing a variety of types of ripple effects. We evaluate prominent editing methods on \ripple{}, showing that current methods fail to introduce consistent changes in the model's knowledge. In addition, we find that a simple in-context editing baseline obtains the best scores on our benchmark, suggesting a promising research direction for model editing.
翻译:现代语言模型捕捉了大量事实性知识。然而,某些事实可能被错误地归纳或随时间过时,导致生成结果出现事实错误。这促使了多种编辑方法的发展,这些方法允许更新模型编码的事实。对这类方法的评估主要集中于测试单个事实是否成功注入,以及关于其他主体的类似预测是否保持不变。本文指出,这种评估存在局限性,因为注入一个事实(例如“杰克·德普是约翰尼·德普的儿子”)会引发“涟漪效应”,即模型需要更新额外的事实(例如“杰克·德普是莉莉-罗斯·德普的兄弟姐妹”)。为解决这一问题,我们提出了一套全新的评估标准,考察编辑对相关事实的影响。基于这些标准,我们构建了\ripple{}——一个包含5000个事实编辑的诊断性基准,涵盖了多种类型的涟漪效应。我们在\ripple{}上评估了主流编辑方法,结果表明当前方法未能使模型知识产生一致的变更。此外,我们发现简单的上下文编辑基线方法在我们的基准上取得了最佳分数,这为模型编辑领域指出了具有前景的研究方向。