Inner products of neural network feature maps arises in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
翻译:神经网络特征映射的内积作为建模输入间关系的一种方法,广泛应用于各类机器学习框架中。本文研究神经网络内积的近似性质。研究表明,多层感知机与其自身的内积是对称正定关系函数的通用逼近器。对于非对称关系函数,两个不同多层感知机的内积被证明是通用逼近器。在两种情形下,均获得了实现给定近似精度所需神经元数量的界。在对称情形下,函数类可等价于再生核希尔伯特空间的核,而在非对称情形下,函数类可等价于再生核巴拿赫空间的核。最后,将这些近似结果应用于分析Transformer的底层注意力机制,表明任何由抽象预序定义的检索机制均可通过注意力机制的内积关系进行逼近。该结果利用经济学中的德布鲁表示定理,以效用函数形式表征偏好关系。