Transformers, and the attention mechanism in particular, have become ubiquitous in machine learning. Their success in modeling nonlocal, long-range correlations has led to their widespread adoption in natural language processing, computer vision, and time-series problems. Neural operators, which map spaces of functions into spaces of functions, are necessarily both nonlinear and nonlocal if they are universal; it is thus natural to ask whether the attention mechanism can be used in the design of neural operators. Motivated by this, we study transformers in the function space setting. We formulate attention as a map between infinite dimensional function spaces and prove that the attention mechanism as implemented in practice is a Monte Carlo or finite difference approximation of this operator. The function space formulation allows for the design of transformer neural operators, a class of architectures designed to learn mappings between function spaces, for which we prove a universal approximation result. The prohibitive cost of applying the attention operator to functions defined on multi-dimensional domains leads to the need for more efficient attention-based architectures. For this reason we also introduce a function space generalization of the patching strategy from computer vision, and introduce a class of associated neural operators. Numerical results, on an array of operator learning problems, demonstrate the promise of our approaches to function space formulations of attention and their use in neural operators.
翻译:Transformer模型,特别是其中的注意力机制,已在机器学习领域无处不在。其在建模非局部、长程相关性方面的成功,使其在自然语言处理、计算机视觉和时间序列问题中得到广泛应用。神经算子将函数空间映射到函数空间,若需具备通用性,则必然兼具非线性和非局部特性;因此,一个自然的问题是:注意力机制能否用于神经算子的设计?受此启发,我们在函数空间框架下研究Transformer模型。我们将注意力表述为无限维函数空间之间的映射,并证明实际应用中实现的注意力机制是该算子的蒙特卡洛或有限差分近似。函数空间表述允许设计Transformer神经算子——一类专门学习函数空间之间映射的架构,我们为此证明了其通用逼近性质。将注意力算子应用于多维域上定义的函数时,其计算成本过高,这催生了对更高效的基于注意力的架构的需求。为此,我们引入了计算机视觉中分块策略在函数空间上的推广,并提出一类相关的神经算子。在一系列算子学习问题上的数值实验结果表明,我们所提出的函数空间注意力表述方法及其在神经算子中的应用具有显著潜力。