Approximating Matrix Functions with Deep Neural Networks and Transformers

Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map scalar functions to matrices, using neural networks including transformers. We focus on functions mapping square matrices to square matrices of the same dimension. These types of matrix functions appear throughout scientific computing, e.g., the matrix exponential in continuous-time Markov chains and the matrix sign function in stability analysis of dynamical systems. In this paper, we make two contributions. First, we prove bounds on the width and depth of ReLU networks needed to approximate the matrix exponential to an arbitrary precision. Second, we show experimentally that a transformer encoder-decoder with suitable numerical encodings can approximate certain matrix functions at a relative error of 5% with high probability. Our study reveals that the encoding scheme strongly affects performance, with different schemes working better for different functions.

翻译：Transformer模型已彻底变革了自然语言处理领域，但其在数值计算中的应用尚未得到充分关注。本研究探讨了使用包括Transformer在内的神经网络逼近矩阵函数——即将标量函数映射至矩阵的算子。我们重点关注将方阵映射至同维方阵的函数类型。此类矩阵函数广泛存在于科学计算中，例如连续时间马尔可夫链中的矩阵指数函数，以及动力系统稳定性分析中的矩阵符号函数。本文作出两项贡献：首先，我们证明了逼近矩阵指数函数至任意精度所需的ReLU网络宽度与深度的理论界限；其次，通过实验证明采用适当数值编码的Transformer编码器-解码器能够以高概率实现5%相对误差的特定矩阵函数逼近。我们的研究表明，编码方案对性能具有显著影响，不同函数适配不同的编码方案。

相关内容

函数逼近

关注 0

通常，函数逼近问题要求我们从定义明确的类中选择一个函数，该类以特定于任务的方式与目标函数紧密匹配（“近似”）。在应用数学的许多分支中，特别是在计算机科学中，都出现了函数逼近的需求。一个人可以区分两类主要的函数逼近问题：首先，对于已知的目标函数，逼近理论是数值分析的分支，它研究如何通过特定的函数类（例如，某些函数）来近似某些已知函数（例如，特殊函数）。，多项式或有理函数），这些属性通常具有理想的属性（廉价的计算，连续性，积分和极限值等）。其次，目标函数g可能是未知的；而不是显式公式，仅提供（x，g（x））形式的一组点。取决于g的域和共域的结构，可以采用几种近似g的技术。例如，如果g是对实数的运算，则可以使用插值，外推，回归分析和曲线拟合的技术。如果g的共域（范围集或目标集）是一个有限集，那么人们正在处理一个分类问题。在某种程度上，不同的问题（回归，分类，适应度近似）在统计学习理论中得到了统一的处理，在这些理论中，它们被视为监督学习问题。