Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly.
翻译:尽管文本生成与推理能力已取得显著进展,大语言模型(LLMs)在执行复杂算术运算时仍面临准确性挑战。为实现精确计算,语言模型系统常依赖LLMs生成算术运算代码。然而,该方法会牺牲运算速度与安全性,若涉及微调还可能使语言模型丧失原有能力。我们提出一种可在\textit{单次自回归步骤}中实现精确算术运算的框架,为具备算术能力的LLM系统提供更快速、更安全且更可解释的解决方案。该框架利用LLM的隐藏状态控制执行算术运算的符号架构。我们以Llama 3 8B Instruct为基座、OccamNet为符号模型实现的系统(OccamLlama)在单次算术运算($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$)中达到100%准确率,性能超越GPT 4o,与启用代码解释器的GPT 4o持平。在涉及复杂算术的数学问题求解基准测试中,OccamLlama无论对比启用或未启用代码解释器的GPT 4o均表现更优,从而使小型LLM能够媲美甚至远超更大规模模型的算术性能。我们将于近期公开代码。