Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge. We designed a synthetic dataset with seven representative OCKR tasks to systematically assess the OCKR capabilities of LLMs. Using this dataset, we evaluated several LLMs and discovered that their proficiency in this aspect is limited, regardless of whether the knowledge is trained in a separate or adjacent training settings. Moreover, training the model to reason with reasoning examples does not result in significant improvement, while training the model to perform explicit knowledge retrieval helps for retrieving attribute knowledge but not the relation knowledge, indicating that the model's limited OCKR capabilities are due to difficulties in knowledge retrieval. Furthermore, we treat cross-lingual knowledge transfer as a distinct form of OCKR, and evaluate this ability. Our results show that the evaluated model also exhibits limited ability in transferring knowledge across languages.
翻译:大型语言模型(LLMs)拥有广泛的知识储备和强大的上下文内推理能力。然而,先前的研究对其上下文外推理能力——即从训练数据而非上下文或提示中推断信息的能力——提出了质疑。本文聚焦于上下文外推理的一个重要方面:上下文外知识推理,即结合多项知识以推断新知识。我们设计了一个包含七项代表性OCKR任务的合成数据集,以系统评估LLMs的OCKR能力。利用该数据集,我们对多个LLMs进行了评估,发现无论知识是在独立还是相邻的训练设置中习得,模型在此方面的能力均存在局限。此外,通过推理示例训练模型并未带来显著改进,而训练模型执行显式知识检索虽有助于检索属性知识,但对关系知识检索无效,这表明模型有限的OCKR能力源于知识检索的困难。进一步地,我们将跨语言知识迁移视为OCKR的一种独特形式,并评估了这种能力。结果显示,所评估模型在跨语言迁移知识方面同样表现出有限的能力。