Semantic associations such as the link between "bird" and "flew" are foundational for language modeling as they enable models to go beyond memorization and instead generalize and generate coherent text. Understanding how these associations are learned and represented in language models is essential for connecting deep learning with linguistic theory and developing a mechanistic foundation for large language models. In this work, we analyze how these associations emerge from natural language data in attention-based language models through the lens of training dynamics. By leveraging a leading-term approximation of the gradients, we develop closed-form expressions for the weights at early stages of training that explain how semantic associations first take shape. Through our analysis, we reveal that each set of weights of the transformer has closed-form expressions as simple compositions of three basis functions (bigram, token-interchangeability, and context mappings), reflecting the statistics of the text corpus and uncovering how each component of the transformer captures semantic associations based on these compositions. Experiments on real-world LLMs demonstrate that our theoretical weight characterizations closely match the learned weights, and qualitative analyses further show how our theorem shines light on interpreting the learned associations in transformers.
翻译:诸如"鸟"与"飞"之间的语义关联是语言建模的基础,它们使模型能够超越记忆,实现泛化并生成连贯文本。理解这些关联如何在语言模型中被学习与表征,对于连接深度学习与语言学理论、构建大语言模型的机制基础至关重要。本研究通过训练动态的视角,分析基于注意力的语言模型如何从自然语言数据中习得这些关联。通过利用梯度的主导项近似,我们推导出训练早期阶段权重的闭式表达式,从而解释语义关联的初始形成机制。分析表明,Transformer的每组权重均可表示为三种基础函数(双词共现、标记可互换性与上下文映射)的简单组合闭式,这些组合既反映了文本语料的统计特性,也揭示了Transformer各组件如何基于这些组合捕获语义关联。在真实大语言模型上的实验表明,我们的理论权重刻画与学习到的权重高度吻合;定性分析进一步展示了该定理如何为解释Transformer中习得的关联提供理论依据。