Recent advancements in large language models, such as GPT-4, have demonstrated remarkable capabilities in processing standard queries. Despite these advancements, their performance substantially declines in \textbf{advanced mathematical problems requiring complex, multi-step logical reasoning}. To enhance their inferential capabilities, current research has delved into \textit{prompting engineering}, exemplified by methodologies such as the Tree of Thought and Graph of Thought. Nonetheless, these existing approaches encounter two significant limitations. Firstly, their effectiveness in tackling complex mathematical problems is somewhat constrained. Secondly, the necessity to design distinct prompts for individual problems hampers their generalizability. In response to these limitations, this paper introduces the \textit{Multi-Agent System for conditional Mining} (\textbf{MACM}) prompting method. It not only resolves intricate mathematical problems but also demonstrates strong generalization capabilities across various mathematical contexts. With the assistance of MACM, the accuracy of GPT-4 Turbo on the most challenging level five mathematical problems in the MATH dataset increase from $\mathbf{54.68\%} \text{ to } \mathbf{76.73\%}$. The code is available in \url{https://github.com/bin123apple/MACM}.
翻译:近年来,大型语言模型(如GPT-4)在处理标准查询方面展现出卓越能力。然而,在**需要复杂多步逻辑推理的高阶数学问题**上,其性能仍显著下降。为提升模型的推理能力,当前研究已深入探索**提示工程**,例如思维树与思维图等方法。尽管如此,现有方法仍面临两大局限:其一,在解决复杂数学问题时的有效性较为有限;其二,需针对不同问题设计特定提示,这限制了方法的泛化能力。针对这些不足,本文提出**基于条件挖掘的多智能体系统**(**MACM**)提示方法。该方法不仅能解决复杂数学问题,还在多种数学场景中展现出强大的泛化能力。在MACM的辅助下,GPT-4 Turbo在MATH数据集中最具挑战性的五级数学问题上的准确率从$\mathbf{54.68\%}$提升至$\mathbf{76.73\%}$。代码已发布于\url{https://github.com/bin123apple/MACM}。