Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with simple questions supported by a generic fact, LLMs often fail to provide consistent and precise answers, indicating a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts.
翻译:大语言模型(LLMs)在各种推理场景中展现出令人瞩目的性能和强大的可解释性,标志着向模拟人类智能迈出了重要一步。尽管如此,当面对需要借助通用事实的简单问题时,LLMs常常无法提供一致且精确的答案,暴露出其在抽象推理能力上的不足。这一现象引发了关于LLMs究竟是真正推理还是单纯记忆的激烈讨论。为此,我们设计了一项初步研究,旨在量化并深入探究现有LLMs的抽象推理能力。研究结果揭示了其通用推理与抽象推理表现之间的显著差距。为缓解这一问题,我们定制了抽象推理数据集(AbsR)及相应的有意义学习范式,用以教导LLMs如何利用通用事实进行推理。实验表明,我们的方法不仅提升了LLMs的通用推理性能,更显著增强了其抽象推理能力——使其超越简单的记忆或模仿,迈向对通用事实更深入的理解与灵活应用。