Recent advancements in large language models, such as GPT-4, have demonstrated remarkable capabilities in processing standard queries. Despite these advancements, their performance substantially declines in \textbf{advanced mathematical problems requiring complex, multi-step logical reasoning}. To enhance their inferential capabilities, current research has delved into \textit{prompting engineering}, exemplified by methodologies such as the Tree of Thought and Graph of Thought. Nonetheless, these existing approaches encounter two significant limitations. Firstly, their effectiveness in tackling complex mathematical problems is somewhat constrained. Secondly, the necessity to design distinct prompts for individual problems hampers their generalizability. In response to these limitations, this paper introduces the \textit{Multi-Agent System for conditional Mining} (\textbf{MACM}) prompting method. It not only resolves intricate mathematical problems but also demonstrates strong generalization capabilities across various mathematical contexts. With the assistance of MACM, the accuracy of GPT-4 Turbo on the most challenging level five mathematical problems in the MATH dataset increase from $\mathbf{54.68\%} \text{ to } \mathbf{76.73\%}$. The code is available in \url{https://github.com/bin123apple/MACM}.
翻译:近年来,以GPT-4为代表的大语言模型在处理常规查询方面展现出卓越能力。然而,在需要复杂多步逻辑推理的**高级数学问题**中,其性能显著下降。为增强推理能力,当前研究已深入探索**提示工程**领域,例如思维树和图思维等方法。但这些现有方法面临两大局限:首先,它们在解决复杂数学问题上的有效性较为有限;其次,为每个问题设计独立提示的必要性制约了其泛化能力。针对这些局限,本文提出**多智能体条件挖掘系统**(MACM)提示方法。该方法不仅能解决复杂数学问题,还在多种数学场景中展现出强大的泛化能力。借助MACM,GPT-4 Turbo在MATH数据集中最具挑战性的第五级数学问题上的准确率从**54.68%提升至76.73%**。相关代码已开源至\url{https://github.com/bin123apple/MACM}。