Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map scalar functions to matrices, using neural networks including transformers. We focus on functions mapping square matrices to square matrices of the same dimension. These types of matrix functions appear throughout scientific computing, e.g., the matrix exponential in continuous-time Markov chains and the matrix sign function in stability analysis of dynamical systems. In this paper, we make two contributions. First, we prove bounds on the width and depth of ReLU networks needed to approximate the matrix exponential to an arbitrary precision. Second, we show experimentally that a transformer encoder-decoder with suitable numerical encodings can approximate certain matrix functions at a relative error of 5% with high probability. Our study reveals that the encoding scheme strongly affects performance, with different schemes working better for different functions.
翻译:Transformer模型已彻底变革了自然语言处理领域,但其在数值计算中的应用尚未得到充分关注。本研究探讨了使用包括Transformer在内的神经网络逼近矩阵函数——即将标量函数映射至矩阵的算子。我们重点关注将方阵映射至同维方阵的函数类型。此类矩阵函数广泛存在于科学计算中,例如连续时间马尔可夫链中的矩阵指数函数,以及动力系统稳定性分析中的矩阵符号函数。本文作出两项贡献:首先,我们证明了逼近矩阵指数函数至任意精度所需的ReLU网络宽度与深度的理论界限;其次,通过实验证明采用适当数值编码的Transformer编码器-解码器能够以高概率实现5%相对误差的特定矩阵函数逼近。我们的研究表明,编码方案对性能具有显著影响,不同函数适配不同的编码方案。