Mathematical reasoning in large language models (LMs) has garnered significant attention in recent work, but there is a limited understanding of how these models process and store information related to arithmetic tasks within their architecture. In order to improve our understanding of this aspect of language models, we present a mechanistic interpretation of Transformer-based LMs on arithmetic questions using a causal mediation analysis framework. By intervening on the activations of specific model components and measuring the resulting changes in predicted probabilities, we identify the subset of parameters responsible for specific predictions. This provides insights into how information related to arithmetic is processed by LMs. Our experimental results indicate that LMs process the input by transmitting the information relevant to the query from mid-sequence early layers to the final token using the attention mechanism. Then, this information is processed by a set of MLP modules, which generate result-related information that is incorporated into the residual stream. To assess the specificity of the observed activation dynamics, we compare the effects of different model components on arithmetic queries with other tasks, including number retrieval from prompts and factual knowledge questions.
翻译:大型语言模型(LM)的数学推理能力在近期研究中备受关注,但学界对这类模型如何在其架构中处理与存储算术任务相关信息仍认知有限。为深化对语言模型这一特性的理解,我们采用因果中介分析框架,对基于Transformer的LM在算术问题上的推理机制进行了解释性研究。通过干预特定模型组件的激活状态并测量预测概率的相应变化,我们识别出负责特定预测的参数子集,揭示了LM处理算术相关信息的机制。实验结果表明,LM通过注意力机制将查询相关信息从中序列早期层传递至最终词元,随后这些信息经一组MLP模块处理后生成结果相关表征,并融入残差流。为验证所观测激活动态的特异性,我们比较了不同模型组件在算术查询、数字提取任务及事实知识查询等多类任务中的差异效应。