Large language models (LLMs) have demonstrated remarkable mathematical capabilities, largely driven by chain-of-thought (CoT) prompting, which decomposes complex reasoning into step-by-step solutions. This approach has enabled significant advancements, as evidenced by performance on benchmarks like GSM8K and MATH. However, the mechanisms underlying LLMs' ability to perform arithmetic in a single step of CoT remain poorly understood. Existing studies debate whether LLMs encode numerical values or rely on symbolic reasoning, while others explore attention and multi-layered processing in arithmetic tasks. In this work, we propose that LLMs learn arithmetic by capturing algebraic structures, such as \emph{Commutativity} and \emph{Identity} properties. Since these structures are observable through input-output relationships, they can generalize to unseen data. We empirically demonstrate that LLMs can learn algebraic structures using a custom dataset of arithmetic problems. Our findings indicate that leveraging algebraic structures can enhance the LLMs' arithmetic capabilities, offering insights into improving their arithmetic performance.
翻译:大语言模型(LLMs)已展现出卓越的数学能力,这主要得益于思维链(CoT)提示的推动,该方法将复杂推理分解为逐步求解的过程。这种策略带来了显著进展,如在GSM8K和MATH等基准测试中的表现所证明。然而,对于LLMs在单步CoT中执行算术运算的内在机制,目前仍缺乏深入理解。现有研究围绕LLMs是编码数值还是依赖符号推理展开争论,亦有工作探讨算术任务中的注意力机制与多层处理过程。本研究提出,LLMs通过捕捉代数结构(如\emph{交换律}与\emph{恒等律}特性)来学习算术运算。由于这些结构可通过输入-输出关系观测到,它们能够泛化至未见数据。我们通过自定义算术问题数据集实证表明,LLMs能够学习代数结构。研究结果指出,利用代数结构可增强LLMs的算术能力,这为提升其算术性能提供了新的见解。