Large language models (LLMs) have been shown to perform better when asked to reason step-by-step before answering a question. However, it is unclear to what degree the model's final answer is faithful to the stated reasoning steps. In this paper, we perform a causal mediation analysis on twelve LLMs to examine how intermediate reasoning steps generated by the LLM influence the final outcome and find that LLMs do not reliably use their intermediate reasoning steps when generating an answer. To address this issue, we introduce FRODO, a framework to tailor small-sized LMs to generate correct reasoning steps and robustly reason over these steps. FRODO consists of an inference module that learns to generate correct reasoning steps using an implicit causal reward function and a reasoning module that learns to faithfully reason over these intermediate inferences using a counterfactual and causal preference objective. Our experiments show that FRODO significantly outperforms four competitive baselines. Furthermore, FRODO improves the robustness and generalization ability of the reasoning LM, yielding higher performance on out-of-distribution test sets. Finally, we find that FRODO's rationales are more faithful to its final answer predictions than standard supervised fine-tuning.
翻译:大型语言模型(LLMs)在回答问题前被要求逐步推理时,其表现已被证明更优。然而,模型的最终答案在多大程度上忠实于其所述的推理步骤尚不明确。本文对十二种大型语言模型进行了因果中介分析,以探究其生成的中间推理步骤如何影响最终结果,发现模型在生成答案时并未可靠地利用中间推理步骤。为解决这一问题,我们提出了FRODO框架,该框架能定制小型语言模型以生成正确的推理步骤并稳健地基于这些步骤进行推理。FRODO包含一个推理模块(利用隐式因果奖励函数学习生成正确推理步骤)和一个推理模块(利用反事实与因果偏好目标学习忠实地基于中间推理进行推演)。实验表明,FRODO显著优于四种竞争性基线方法。此外,FRODO提升了推理语言模型的鲁棒性与泛化能力,在分布外测试集上取得了更高性能。最后,我们发现FRODO的推理依据比标准监督微调更忠实于其最终答案预测。