Large language models (LLMs) have developed impressive performance and strong explainability across various reasoning scenarios, marking a significant stride towards mimicking human-like intelligence. Despite this, when tasked with several simple questions supported by a generic fact, LLMs often struggle to abstract and apply the generic fact to provide consistent and precise answers, revealing a deficiency in abstract reasoning abilities. This has sparked a vigorous debate about whether LLMs are genuinely reasoning or merely memorizing. In light of this, we design a preliminary study to quantify and delve into the abstract reasoning abilities of existing LLMs. Our findings reveal a substantial discrepancy between their general reasoning and abstract reasoning performances. To relieve this problem, we tailor an abstract reasoning dataset (AbsR) together with a meaningful learning paradigm to teach LLMs how to leverage generic facts for reasoning purposes. The results show that our approach not only boosts the general reasoning performance of LLMs but also makes considerable strides towards their capacity for abstract reasoning, moving beyond simple memorization or imitation to a more nuanced understanding and application of generic facts. The code is available at https://github.com/Waste-Wood/MeanLearn.
翻译:大语言模型(LLMs)在各种推理场景中展现出令人印象深刻的性能和强大的可解释性,标志着在模仿人类智能方面取得了重要进展。尽管如此,当面对由通用事实支持的简单问题时,LLMs往往难以抽象化并应用该通用事实来提供一致且精确的答案,这揭示了其在抽象推理能力方面的不足。这引发了一场关于LLMs是真正在进行推理还是仅仅在记忆的激烈辩论。鉴于此,我们设计了一项初步研究,以量化和深入探究现有LLMs的抽象推理能力。我们的研究结果揭示了它们在一般推理和抽象推理表现之间存在显著差异。为了缓解这一问题,我们构建了一个抽象推理数据集(AbsR),并结合一种意义学习范式,以教导LLMs如何利用通用事实进行推理。结果表明,我们的方法不仅提升了LLMs的一般推理性能,还在其抽象推理能力方面取得了显著进展,使其超越了简单的记忆或模仿,实现了对通用事实更细致入微的理解和应用。代码可在 https://github.com/Waste-Wood/MeanLearn 获取。