Inner products of neural network feature maps arise in a wide variety of machine learning frameworks as a method of modeling relations between inputs. This work studies the approximation properties of inner products of neural networks. It is shown that the inner product of a multi-layer perceptron with itself is a universal approximator for symmetric positive-definite relation functions. In the case of asymmetric relation functions, it is shown that the inner product of two different multi-layer perceptrons is a universal approximator. In both cases, a bound is obtained on the number of neurons required to achieve a given accuracy of approximation. In the symmetric case, the function class can be identified with kernels of reproducing kernel Hilbert spaces, whereas in the asymmetric case the function class can be identified with kernels of reproducing kernel Banach spaces. Finally, these approximation results are applied to analyzing the attention mechanism underlying Transformers, showing that any retrieval mechanism defined by an abstract preorder can be approximated by attention through its inner product relations. This result uses the Debreu representation theorem in economics to represent preference relations in terms of utility functions.
翻译:神经网络特征映射的内积作为建模输入之间关系的方法,广泛存在于各类机器学习框架中。本文研究了神经网络内积的逼近性质。研究表明,多层感知器与其自身的内积是对称正定关系函数的通用逼近器。对于非对称关系函数,研究表明两个不同的多层感知器的内积是通用逼近器。在两种情况下,均获得了达到给定逼近精度所需神经元数量的界。在对称情形中,该函数类可等同于再生核希尔伯特空间的核;而在非对称情形中,该函数类可等同于再生核巴拿赫空间的核。最后,将这些逼近结果应用于分析Transformer底层的注意力机制,表明任何由抽象预序定义的检索机制均可通过注意力机制的内积关系进行逼近。该结果运用经济学中的德布鲁表示定理,将偏好关系表示为效用函数。