Deduction, induction, and abduction are fundamental reasoning paradigms, core for human logical thinking. Although improving Large Language Model (LLM) reasoning has attracted significant research efforts, the extent to which the fundamental paradigms induce generalization has yet to be systematically explored. In this study, we shed light on how the interplay between these core paradigms influences LLMs' reasoning behavior. To this end, we first collect a new dataset of reasoning trajectories from symbolic tasks, each targeting one of the three fundamental paradigms, to abstract from concrete world knowledge. Then, we investigate effective ways for inducing these skills into LLMs. We experiment with a battery of methods including simple fine-tuning, and more complex approaches to increase model depth, or transform a dense model to a mixture-of-experts. We comprehensively evaluate induced models on realistic out-of-domain tasks, that are entirely formulated in natural language and contain real-world knowledge. Our results reveal that our approach yields strong generalizability with substantial performance gains (up to $14.60$) across realistic tasks.
翻译:演绎、归纳与溯因是基本推理范式,也是人类逻辑思维的核心。尽管提升大语言模型(LLM)的推理能力已吸引大量研究关注,但基本范式在多大程度上能诱导泛化能力尚未得到系统探索。本研究揭示了这些核心范式间的相互作用如何影响LLM的推理行为。为此,我们首先从符号化任务中收集了一个新的推理轨迹数据集,每个任务针对三种基本范式之一,以抽离具体世界知识的影响。随后,我们研究了将这些推理技能注入LLM的有效方法。我们实验了一系列方法,包括简单微调,以及增加模型深度或将稠密模型转换为混合专家模型等更复杂的方案。我们在完全以自然语言表述且包含真实世界知识的现实领域外任务上对诱导模型进行全面评估。结果表明,我们的方法在现实任务中展现出强大的泛化能力,性能提升显著(最高达$14.60$)。