Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match $N$ arbitrary input measures to $N$ arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.
翻译:Transformer是支撑近期大型语言模型成功的关键深度神经网络架构。与可视为点对点映射的经典架构不同,Transformer作为测度间映射,其实现方式可视为单位球面上的特定交互粒子系统:输入是提示中标记的经验测度,其演化由连续性方程控制。实际上,Transformer不仅限于处理经验测度,原则上能够处理任意输入测度。随着Transformer处理数据类型的迅速扩展,研究其作为任意测度间映射的表达能力至关重要。为此,我们给出了明确的参数选择方案,使得单个Transformer能够在每对输入-目标测度均可通过某个传输映射匹配的最小假设下,实现N个任意输入测度与N个任意目标测度的精确匹配。