The factuality of large language model (LLMs) tends to decay over time since events posterior to their training are "unknown" to them. One way to keep models up-to-date could be factual update: the task of inserting, replacing, or removing certain simple (atomic) facts within the model. To study this task, we present WikiFactDiff, a dataset that describes the evolution of factual knowledge between two dates as a collection of simple facts divided into three categories: new, obsolete, and static. We describe several update scenarios arising from various combinations of these three types of basic update. The facts are represented by subject-relation-object triples; indeed, WikiFactDiff was constructed by comparing the state of the Wikidata knowledge base at 4 January 2021 and 27 February 2023. Those fact are accompanied by verbalization templates and cloze tests that enable running update algorithms and their evaluation metrics. Contrary to other datasets, such as zsRE and CounterFact, WikiFactDiff constitutes a realistic update setting that involves various update scenarios, including replacements, archival, and new entity insertions. We also present an evaluation of existing update algorithms on WikiFactDiff.
翻译:大型语言模型(LLM)的事实性往往随时间推移而衰减,因为模型训练完成后发生的事件对其而言是“未知”的。使模型保持更新的方法之一是实现事实更新:即在模型中插入、替换或删除某些简单(原子)事实。为研究该任务,我们提出了WikiFactDiff数据集,该数据集将两个时间点之间事实知识的演化描述为简单事实的集合,并将其分为三类:新增事实、过时事实和静态事实。我们描述了由这三类基本更新的不同组合所产生的若干更新场景。这些事实以“主体-关系-客体”三元组形式呈现。事实上,WikiFactDiff通过比较2021年1月4日与2023年2月27日两个时间点的Wikidata知识库状态构建而成。这些事实附带了口头化模板和完形填空测试,可支持运行更新算法及其评估指标。与其他数据集(如zsRE和CounterFact)不同,WikiFactDiff构成了一个涉及多种更新场景(包括替换、归档及新增实体插入)的现实更新环境。我们还展示了现有更新算法在WikiFactDiff上的评估结果。