Despite significant advancements in text generation and reasoning, Large Language Models (LLMs) still face challenges in accurately performing complex arithmetic operations. To achieve accurate calculations, language model systems often enable LLMs to generate code for arithmetic operations. However, this approach compromises speed and security and, if finetuning is involved, risks the language model losing prior capabilities. We propose a framework that enables exact arithmetic in \textit{a single autoregressive step}, providing faster, more secure, and more interpretable LLM systems with arithmetic capabilities. We use the hidden states of an LLM to control a symbolic architecture which performs arithmetic. Our implementation using Llama 3 8B Instruct with OccamNet as a symbolic model (OccamLlama) achieves 100\% accuracy on single arithmetic operations ($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$), outperforming GPT 4o and on par with GPT 4o using a code interpreter. OccamLlama also outperforms GPT 4o both with and without a code interpreter on mathematical problem solving benchmarks involving challenging arithmetic, thus enabling small LLMs to match the arithmetic performance of even much larger models. We will make our code public shortly.
翻译:尽管在文本生成和推理方面取得了显著进展,大型语言模型(LLMs)在执行复杂算术运算时仍面临准确性挑战。为实现精确计算,语言模型系统通常让LLMs生成用于算术运算的代码。然而,这种方法会牺牲速度和安全性,若涉及微调,还可能使语言模型丧失原有能力。我们提出一个框架,能够在\textit{单次自回归步骤}中实现精确算术运算,从而构建更快、更安全、更具可解释性的具备算术能力的LLM系统。我们利用LLM的隐藏状态来控制执行算术运算的符号架构。我们使用Llama 3 8B Instruct模型与OccamNet符号模型(OccamLlama)的实现方案,在单次算术运算($+,-,\times,\div,\sin{},\cos{},\log{},\exp{},\sqrt{}$)中达到了100%的准确率,其表现优于GPT 4o,并与使用代码解释器的GPT 4o持平。在涉及复杂算术的数学问题求解基准测试中,OccamLlama无论是否使用代码解释器均优于GPT 4o,从而使小型LLM能够达到甚至远超其规模的大型模型的算术性能。我们将于近期公开代码。