Deductive Verification of Chain-of-Thought Reasoning

Large Language Models (LLMs) significantly benefit from Chain-of-Thought (CoT) prompting in performing various reasoning tasks. While CoT allows models to produce more comprehensive reasoning processes, its emphasis on intermediate reasoning steps can inadvertently introduce hallucinations and accumulated errors, thereby limiting models' ability to solve complex reasoning tasks. Inspired by how humans engage in careful and meticulous deductive logical reasoning processes to solve tasks, we seek to enable language models to perform explicit and rigorous deductive reasoning, and also ensure the trustworthiness of their reasoning process through self-verification. However, directly verifying the validity of an entire deductive reasoning process is challenging, even with advanced models like ChatGPT. In light of this, we propose to decompose a reasoning verification process into a series of step-by-step subprocesses, each only receiving their necessary context and premises. To facilitate this procedure, we propose Natural Program, a natural language-based deductive reasoning format. Our approach enables models to generate precise reasoning steps where subsequent steps are more rigorously grounded on prior steps. It also empowers language models to carry out reasoning self-verification in a step-by-step manner. By integrating this verification process into each deductive reasoning stage, we significantly enhance the rigor and trustfulness of generated reasoning steps. Along this process, we also improve the answer correctness on complex reasoning tasks. Code will be released at https://github.com/lz1oceani/verify_cot.

翻译：大型语言模型（LLMs）在执行各类推理任务时，显著受益于链式推理（Chain-of-Thought, CoT）提示方法。尽管CoT使模型能够产生更全面的推理过程，但其对中间推理步骤的强调可能无意中引入幻觉和累积误差，从而限制模型解决复杂推理任务的能力。受人类通过严谨细致的演绎逻辑推理过程解决问题的启发，我们致力于使语言模型能够执行明确且严格的演绎推理，并通过自我验证确保其推理过程的可靠性。然而，即使使用像ChatGPT这样的先进模型，直接验证整个演绎推理过程的有效性仍具有挑战性。鉴于此，我们提出将推理验证过程分解为一系列逐步的子过程，每个子过程仅接收其必要的上下文和前提。为促进这一流程，我们提出了自然程序（Natural Program），一种基于自然语言的演绎推理格式。我们的方法使模型能够生成精确的推理步骤，其中后续步骤更严格地基于先前步骤。它还使语言模型能够以逐步方式执行推理自我验证。通过将这一验证过程整合到每个演绎推理阶段，我们显著提高了所生成推理步骤的严谨性和可靠性。在此过程中，我们还提升了复杂推理任务的答案正确性。代码将在https://github.com/lz1oceani/verify_cot发布。