Recent studies on transformer-based language models show that they can answer questions by reasoning over knowledge provided as part of the context (i.e., in-context reasoning). However, since the available knowledge is often not filtered for a particular question, in-context reasoning can be sensitive to distractor facts, additional content that is irrelevant to a question but that may be relevant for a different question (i.e., not necessarily random noise). In these situations, the model fails to distinguish the knowledge that is necessary to answer the question, leading to spurious reasoning and degraded performance. This reasoning failure contrasts with the model's apparent ability to distinguish its contextual knowledge from all the knowledge it has memorized during pre-training. Following this observation, we propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters before presenting it with a question. Our method, RECKONING, is a bi-level learning algorithm that teaches language models to reason by updating their parametric knowledge through back-propagation, allowing them to then answer questions using the updated parameters. During training, the inner loop rapidly adapts a copy of the model weights to encode contextual knowledge into its parameters. In the outer loop, the model learns to use the updated weights to reproduce and answer reasoning questions about the memorized knowledge. Our experiments on two multi-hop reasoning datasets show that RECKONING's performance improves over the in-context reasoning baseline (by up to 4.5%). We also find that compared to in-context reasoning, RECKONING generalizes better to longer reasoning chains unseen during training, is more robust to distractors in the context, and is more computationally efficient when multiple questions are asked about the same knowledge.
翻译:基于Transformer的语言模型的最新研究表明,它们可以通过对上下文中提供的知识进行推理来回答问题(即上下文内推理)。然而,由于可用知识通常未针对特定问题进行过滤,上下文内推理容易受到干扰事实的影响——这些事实与当前问题无关,但可能与其他问题相关(即并非随机噪声)。在这种情况下,模型无法区分回答问题所需的知识,导致错误推理和性能下降。这种推理失败与模型看似能够将其上下文知识与预训练期间记忆的所有知识区分开的能力形成对比。基于这一观察,我们提出通过将提供的上下文知识在呈现问题之前融入模型参数中,来训练模型进行更鲁棒的推理。我们的方法RECKONING是一种双层学习算法,它通过反向传播更新模型的参数化知识来教语言模型进行推理,使其随后能够使用更新后的参数回答问题。在训练过程中,内循环快速调整模型权重的副本,将上下文知识编码到其参数中。在外循环中,模型学习使用更新后的权重来复现并回答关于记忆知识的推理问题。我们在两个多跳推理数据集上的实验表明,RECKONING的性能优于上下文内推理基线(提升高达4.5%)。我们还发现,与上下文内推理相比,RECKONING对训练中未见过的更长推理链具有更好的泛化能力,对上下文中的干扰因素更鲁棒,并且在针对相同知识提出多个问题时计算效率更高。