Recent advancements have significantly augmented the reasoning capabilities of Large Language Models (LLMs) through various methodologies, especially chain-of-thought (CoT) reasoning. However, previous methods fail to address reasoning errors in intermediate steps, leading to accumulative errors.In this paper, we propose Deductive Beam Search (DBS), which seamlessly integrates CoT and deductive reasoning with step-wise beam search for LLMs. Our approach deploys a verifier, verifying the deducibility of a reasoning step and its premises, thus alleviating the error accumulation. Furthermore, we introduce a scalable and labor-free data construction method to amplify our model's verification capabilities. Extensive experiments demonstrate that our approach significantly enhances the base performance of LLMs of various scales (7B, 13B, 70B, and ChatGPT) across 8 reasoning datasets from 3 diverse reasoning genres, including arithmetic, commonsense, and symbolic. Moreover, our analysis proves DBS's capability of detecting diverse and subtle reasoning errors and robustness on different model scales.
翻译:近期研究通过多种方法,尤其是链式思维(CoT)推理,显著增强了大语言模型(LLM)的推理能力。然而,现有方法无法处理推理中间步骤的错误,导致错误累积。本文提出演绎式束搜索(DBS),将CoT与演绎推理及逐步束搜索无缝集成于LLM中。该方法部署验证器,对推理步骤及其前提的可演绎性进行验证,从而缓解错误累积。此外,我们提出一种可扩展且无需人工标注的数据构建方法,以增强模型的验证能力。大量实验表明,我们的方法在8个涵盖算术、常识和符号三类推理任务的推理数据集上,显著提升了不同规模LLM(7B、13B、70B及ChatGPT)的基础性能。进一步分析证明,DBS具备检测多样化及细微推理错误的能力,且在不同模型规模下展现出稳健性。