The model editing problem concerns how language models should learn new facts about the world over time. While empirical research on model editing has drawn widespread attention, the conceptual foundations of model editing remain shaky -- perhaps unsurprisingly, since model editing is essentially belief revision, a storied problem in philosophy that has eluded succinct solutions for decades. Model editing nonetheless demands a solution, since we need to be able to control the knowledge within language models. With this goal in mind, this paper critiques the standard formulation of the model editing problem and proposes a formal testbed for model editing research. We first describe 12 open problems with model editing, based on challenges with (1) defining the problem, (2) developing benchmarks, and (3) assuming LLMs have editable beliefs in the first place. Many of these challenges are extremely difficult to address, e.g. determining far-reaching consequences of edits, labeling probabilistic entailments between facts, and updating beliefs of agent simulators. Next, we introduce a semi-synthetic dataset for model editing based on Wikidata, where we can evaluate edits against labels given by an idealized Bayesian agent. This enables us to say exactly how belief revision in language models falls short of a desirable epistemic standard. We encourage further research exploring settings where such a gold standard can be compared against. Our code is publicly available at: https://github.com/peterbhase/LLM-belief-revision
翻译:模型编辑问题关注语言模型应如何随时间学习关于世界的新事实。尽管模型编辑的实证研究已引起广泛关注,但其概念基础仍不稳固——这或许并不令人意外,因为模型编辑本质上是信念修正,这是哲学中一个历史悠久且数十年来未能获得简洁解决方案的问题。然而模型编辑仍需解决方案,因为我们需要能够控制语言模型内部的知识。基于此目标,本文批判了模型编辑问题的标准表述,并提出了模型编辑研究的正式测试平台。我们首先基于以下三方面的挑战,阐述了模型编辑的12个开放性问题:(1)问题定义,(2)基准开发,(3)最初假设LLM具有可编辑信念。其中许多挑战极难解决,例如确定编辑的深远影响、标注事实间的概率蕴涵关系,以及更新智能体模拟器的信念。接着,我们引入了一个基于Wikidata的半合成模型编辑数据集,其中可依据理想化贝叶斯智能体给出的标签评估编辑效果。这使我们能精确说明语言模型中的信念修正与理想认知标准之间的差距。我们鼓励进一步探索能与此类黄金标准进行比较的研究场景。我们的代码公开于:https://github.com/peterbhase/LLM-belief-revision