Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.
翻译:Transformers在大语言模型的内部运作中发挥着核心作用。我们基于其作为交互粒子系统的解释,发展了一个用于分析Transformers的数学框架,揭示了在长时间尺度下聚类现象的出现。我们的研究探索了其理论基础,并为数学家和计算机科学家提供了新的视角。