Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.
翻译:大语言模型(LLMs)通常用于以自然语言、代码或两者结合的形式生成数学推理问题的解决方案。本文探讨了使用自然语言和代码配合最先进的大语言模型(包括GPT-4o-mini和LLama-3.1-8b-Turbo)解决数学推理问题的若干基础性问题。我们的研究结果表明,相较于代码,大语言模型更擅长使用自然语言进行推理。此外,尽管自然语言和代码可作为互补的推理形式,但在某些场景下它们可能以负面方式相互影响。这些发现促使我们开发了一种新的提示方法——INC-Math,该方法利用大语言模型动态选择最合适的推理形式,从而在GPT-4o-mini上实现了优于同类基线的性能提升。