Transformers play a central role in the inner workings of large language models. We develop a mathematical framework for analyzing Transformers based on their interpretation as interacting particle systems, which reveals that clusters emerge in long time. Our study explores the underlying theory and offers new perspectives for mathematicians as well as computer scientists.
翻译:Transformer在大语言模型的内部运作中扮演核心角色。我们基于Transformer作为相互作用粒子系统的解读,构建了分析Transformer的数学框架,揭示了在长时间尺度下聚类现象的产生。本研究探讨了其基础理论,并为数学家和计算机科学家提供了新视角。