While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves the astonishing accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 98%, which signifies a substantial enhancement of 24% over the previous state-of-the-art method. Finally, on the MATH dataset, we establish new state-of-the-art results with 58.0% overall accuracy, surpassing the previous best approach by a margin of 4.2%, and achieving 43% relative improvement on the hardest level 5 problems (22.4% to 32.1%). Code is available at https://github.com/iiis-ai/cumulative-reasoning.
翻译:尽管语言模型强大且通用性强,但它们往往无法解决高度复杂的问题。这是因为解决复杂问题需要深思熟虑的思考过程,而在训练过程中这种思考仅得到了极少的指导。本文提出了一种名为累积推理(Cumulative Reasoning, CR)的新方法,该方法以累积和迭代的方式运用语言模型来模拟人类思维过程。通过将任务分解为更小的组成部分,CR简化了问题解决流程,使其更易于管理且更高效。在逻辑推理任务中,CR始终优于现有方法,提升幅度高达9.3%,并在精心整理的FOLIO wiki数据集上达到了惊人的98.04%的准确率。在24点游戏中,CR实现了98%的准确率,相较于先前的最先进方法显著提升了24%。最后,在MATH数据集上,我们以58.0%的整体准确率取得了新的最先进结果,超过了之前的最佳方法达4.2个百分点,并在最难的第5级问题上实现了43%的相对改进(从22.4%提升至32.1%)。代码现已发布于https://github.com/iiis-ai/cumulative-reasoning。