In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optimize for solution likelihood often yield incorrect solutions. To address this issue, we propose Guiding chain-of-thought ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that steers the decoding process towards producing correct reasoning steps. GRACE employs a step-level verifier or discriminator trained with a contrastive loss over correct and incorrect steps, which is used during decoding to score next-step candidates based on their correctness. Importantly, GRACE only requires sampling from the LM, without the need for LM training or fine-tuning. Using models from FLAN-T5 and LLaMA families, we evaluate GRACE over four math and two symbolic reasoning tasks, where it exhibits substantial performance gains compared to greedy decoding, verifiers, and self-consistency in most settings. When further combined with self-consistency, GRACE outperforms all the baselines by sizeable margins. Human and LLM evaluations over GSM8K show that GRACE not only improves the final answer accuracy but also the correctness of the intermediate reasoning. Our implementation can be accessed at https://github.com/mukhal/grace.
翻译:在多步推理(例如思维链)的语境中,语言模型(LMs)很容易为错误的推理步骤分配高似然概率。因此,优化解似然概率的解码策略常常产生错误解。为解决此问题,我们提出了一种基于正确性判别器引导的思维链推理(GRACE)方法,这是一种逐步解码方法,旨在引导解码过程生成正确的推理步骤。GRACE采用一个经过对比损失(基于正确与错误步骤)训练的步级验证器或判别器,该判别器在解码过程中用于根据正确性对候选下一步进行评分。重要的是,GRACE仅需对语言模型进行采样,无需语言模型的训练或微调。使用FLAN-T5和LLaMA系列模型,我们在四个数学和两个符号推理任务上评估GRACE,结果显示在大多数设置中,相比贪婪解码、验证器和自洽性方法,GRACE均表现出显著的性能提升。当进一步与自洽性结合时,GRACE以相当大的优势超越了所有基线方法。在GSM8K上的人工和LLM评估表明,GRACE不仅提高了最终答案的准确性,也改善了中间推理的正确性。我们的实现可访问 https://github.com/mukhal/grace。