The article concerns low-rank approximation of matrices generated by sampling a smooth function of two $m$-dimensional variables. We refute an argument made in the literature that, for a specific class of analytic functions, such matrices admit accurate entrywise approximation of rank that is independent of $m$. We provide a theoretical explanation of the numerical results presented in support of this argument, describing three narrower classes of functions for which $n \times n$ function-generated matrices can be approximated within an entrywise error of order $\varepsilon$ with rank $\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$ that is independent of the dimension $m$: (i) functions of the inner product of the two variables, (ii) functions of the squared Euclidean distance between the variables, and (iii) shift-invariant positive-definite kernels. We extend our argument to low-rank tensor-train approximation of tensors generated with functions of the multi-linear product of their $m$-dimensional variables. We discuss our results in the context of low-rank approximation of attention in transformer neural networks.
翻译:本文研究通过对两个$m$维变量的光滑函数进行采样所生成矩阵的低秩逼近问题。我们反驳了文献中提出的一个论点,该论点认为对于特定类别的解析函数,此类矩阵存在与$m$无关的秩即可实现精确的逐元素逼近。我们为支持该论点所展示的数值结果提供了理论解释,并描述了三个更具体的函数类别,使得$n \times n$函数生成矩阵能以与维度$m$无关的秩$\mathcal{O}(\log(n) \varepsilon^{-2} \mathrm{polylog}(\varepsilon^{-1}))$实现$\varepsilon$量级的逐元素误差逼近:(i) 两个变量内积的函数,(ii) 变量间欧氏距离平方的函数,以及(iii) 平移不变的正定核函数。我们将论证推广到由多维变量内积函数生成的张量的低秩张量链逼近。最后,我们在transformer神经网络中注意力机制的低秩逼近背景下讨论了本研究结果。