Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capabilities. In this study, we examine the matching mechanism employed by Transformer for multi-step reasoning on a constructed dataset. We investigate factors that influence the model's matching mechanism and discover that small initialization and post-LayerNorm can facilitate the formation of the matching mechanism, thereby enhancing the model's reasoning ability. Moreover, we propose a method to improve the model's reasoning capability by adding orthogonal noise. Finally, we investigate the parallel reasoning mechanism of Transformers and propose a conjecture on the upper bound of the model's reasoning ability based on this phenomenon. These insights contribute to a deeper understanding of the reasoning processes in large language models and guide designing more effective reasoning architectures and training strategies.
翻译:大型语言模型在复杂推理任务(如数学问题求解)上始终面临挑战。探究这些模型的内部推理机制有助于我们设计更好的模型架构与训练策略,最终提升其推理能力。本研究通过在构建的数据集上考察Transformer用于多步推理的匹配机制,探究了影响模型匹配机制的因素,发现小初始化和后置层归一化(post-LayerNorm)能够促进匹配机制的形成,从而提升模型的推理能力。此外,我们提出了一种通过添加正交噪声来增强模型推理能力的方法。最后,我们研究了Transformer的并行推理机制,并基于此现象提出了关于模型推理能力上界的猜想。这些发现有助于深入理解大型语言模型的推理过程,并为设计更有效的推理架构与训练策略提供指导。