A neural architecture with randomly initialized weights, in the infinite width limit, is equivalent to a Gaussian Random Field whose covariance function is the so-called Neural Network Gaussian Process kernel (NNGP). We prove that a reproducing kernel Hilbert space (RKHS) defined by the NNGP contains only functions that can be approximated by the architecture. To achieve a certain approximation error the required number of neurons in each layer is defined by the RKHS norm of the target function. Moreover, the approximation can be constructed from a supervised dataset by a random multi-layer representation of an input vector, together with training of the last layer's weights. For a 2-layer NN and a domain equal to an $n-1$-dimensional sphere in ${\mathbb R}^n$, we compare the number of neurons required by Barron's theorem and by the multi-layer features construction. We show that if eigenvalues of the integral operator of the NNGP decay slower than $k^{-n-\frac{2}{3}}$ where $k$ is an order of an eigenvalue, then our theorem guarantees a more succinct neural network approximation than Barron's theorem. We also make some computational experiments to verify our theoretical findings. Our experiments show that realistic neural networks easily learn target functions even when both theorems do not give any guarantees.
翻译:随机初始化权重的神经架构在无限宽度极限下等价于高斯随机场,其协方差函数即所谓的神经网络高斯过程核(NNGP)。我们证明了由NNGP定义的再生核希尔伯特空间(RKHS)仅包含该架构可逼近的函数。为实现特定逼近误差,各层所需神经元数量由目标函数的RKHS范数决定。此外,这种逼近可以通过输入向量的随机多层表示结合最后一层权重的训练,从监督数据集中构建得到。对于两层神经网络与定义在${\mathbb R}^n$中$n-1$维球面上的定义域,我们比较了巴伦定理与多层特征构造所需的神经元数量。研究表明,当NNGP积分算子的特征值衰减速度慢于$k^{-n-\frac{2}{3}}$(其中$k$为特征值阶数)时,我们的定理比巴伦定理保证更简洁的神经网络逼近。我们还通过计算实验验证了理论发现,实验表明现实神经网络即使在这两个定理均无保证的情况下也能轻松学习目标函数。