Untying the Reversal Curse via Bidirectional Language Model Editing

Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectional editing and evaluation have failed to explore the reversal curse. Intuitively, if "The capital of France is" is edited to be a counterfact "London" within a model, then it should be able to naturally reason and recall the reverse fact, i.e., "London is the capital of" followed by "France" instead of "England". In this paper, we study bidirectional language model editing, aiming to provide rigorous model editing evaluation to assess if edited LLMs can recall the editing knowledge bidirectionally. A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing. We surprisingly observe that while current editing methods and LLMs can effectively recall editing facts in the direction of editing, they suffer serious deficiencies when evaluated in the reverse direction. To mitigate the reversal curse, a method named Bidirectionally Inversible Relationship moDeling (BIRD) is proposed. A set of editing objectives that incorporate bidirectional relationships between subject and object into the updated model weights are designed. Experiments show that BIRD improves the performance of four representative LLMs of different sizes via question answering and judgement.

翻译：近期研究表明，大型语言模型能在其参数中存储海量事实知识。但现有大型语言模型易因错误或过时知识产生非预期文本的幻觉现象。由于重新训练大型语言模型资源消耗巨大，模型编辑概念日益受到关注。尽管基准测试和各类方法不断涌现，但这些单向编辑与评估未能深入探究逆转诅咒。直观而言，若将模型内"法国首都是"编辑为反事实"伦敦"，则其应能自然推理并回忆反向事实，即"伦敦是"后接"法国"而非"英国"的首都。本文研究双向语言模型编辑，旨在提供严谨的模型编辑评估框架，检验经编辑的语言模型能否双向回忆编辑知识。我们引入可逆性新评估指标，构建名为双向知识编辑评估（BAKE）的基准测试，用于评估编辑模型在编辑反向方向上回忆知识的可逆性。令人惊讶地观察到，当前编辑方法与语言模型虽能有效回忆编辑方向的事实，但在反向评估时存在严重缺陷。为缓解逆转诅咒，提出双向可逆关系建模（BIRD）方法。该方法设计多组编辑目标，将主体与客体间的双向关系纳入更新的模型权重。实验表明，BIRD通过问答与判断任务，显著提升了四种不同规模代表性语言模型的性能。