Prompting techniques such as Chain-of-Thought (CoT) and Program-of-Thought (PoT) improve LLM mathematical reasoning by structuring intermediate steps in natural language or code. However, applied mathematics problems in domains like finance, physics, and cryptography often require recalling or deriving governing equations, a step that current approaches do not explicitly leverage. We propose Formula-One Prompting (F-1), a two-phase approach that uses mathematical equations as an intermediate representation before adaptive solving. F-1 first formulates governing equations from problem descriptions, then selects a solving strategy among CoT, PoT, or direct computation based on the generated equations, all within a single LLM call. Results across five models and four benchmarks show F-1 outperforms CoT by +5.76% and PoT by +8.42% on average. Crucially, gains are largest in applied domains: +13.30% on FinanceMath over CoT, and within OlympiadBench, larger gains on physics (+2.55%) than pure math (+0.44%). This demonstrates that F-1 is more effective than CoT in applied mathematics problems.
翻译:诸如思维链(CoT)和程序思维(PoT)等提示技术通过用自然语言或代码构建中间步骤,改进了大型语言模型的数学推理能力。然而,在金融、物理和密码学等领域,应用数学问题通常需要回忆或推导控制方程,而当前方法并未明确利用这一步骤。我们提出了公式一提示(F-1),这是一种两阶段方法,在自适应求解之前使用数学方程作为中间表示。F-1首先根据问题描述构建控制方程,然后基于生成的方程在CoT、PoT或直接计算中选择求解策略,整个过程在单次LLM调用中完成。在五个模型和四个基准测试上的结果表明,F-1平均优于CoT 5.76%,优于PoT 8.42%。关键的是,在应用领域收益最大:在FinanceMath上比CoT高出13.30%,在OlympiadBench中,物理问题上的收益(+2.55%)远大于纯数学问题(+0.44%)。这证明F-1在处理应用数学问题时比CoT更为有效。