While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.
翻译:尽管推理在大型语言模型中自然作用于数学、代码生成和多跳事实性问题,但其对简单的单跳事实性问题的影响仍不明确。此类问题无需逐步逻辑分解,使得推理的效用显得极违反直觉。然而,我们发现启用推理能显著扩展模型参数化知识检索的能力边界,解锁那些原本实际上无法获取的正确答案。当不存在复杂推理步骤需要执行时,为何推理仍有助于参数化知识检索?为解答此问题,我们设计了一系列假设驱动的受控实验,并识别出两个关键驱动机制:(1) 计算缓冲效应——模型利用生成的推理标记执行独立于其语义内容的潜在计算;(2) 事实预激活——生成主题相关事实作为语义桥梁以促进正确答案检索。需特别指出的是,后一种生成式自检索机制具有固有风险:我们证明在推理过程中产生虚假中间事实会增加最终答案出现幻觉的可能性。最后,我们展示了如何利用这些洞见直接提升模型准确率——通过优先选择包含无幻觉事实陈述的推理路径。