Bayesian inference and kernel methods are well established in machine learning. The neural network Gaussian process in particular provides a concept to investigate neural networks in the limit of infinitely wide hidden layers by using kernel and inference methods. Here we build upon this limit and provide a field-theoretic formalism which covers the generalization properties of infinitely wide networks. We systematically compute generalization properties of linear, non-linear, and deep non-linear networks for kernel matrices with heterogeneous entries. In contrast to currently employed spectral methods we derive the generalization properties from the statistical properties of the input, elucidating the interplay of input dimensionality, size of the training data set, and variability of the data. We show that data variability leads to a non-Gaussian action reminiscent of a ($\varphi^3+\varphi^4$)-theory. Using our formalism on a synthetic task and on MNIST we obtain a homogeneous kernel matrix approximation for the learning curve as well as corrections due to data variability which allow the estimation of the generalization properties and exact results for the bounds of the learning curves in the case of infinitely many training data points.
翻译:贝叶斯推理与核方法在机器学习中已得到深入研究。其中,神经网络高斯过程通过核方法与推理技术,为研究无限宽隐藏层极限下的神经网络提供了重要概念。本文在此极限基础上构建场论形式体系,以覆盖无限宽网络的泛化特性。我们系统计算了具有异构核矩阵的线性、非线性和深度非线性网络的泛化性质。与当前常用的谱方法不同,我们利用输入数据的统计特性推导泛化特性,阐明了输入维度、训练数据集规模与数据变异性的相互作用。研究表明,数据变异性会产生类似($\varphi^3+\varphi^4$)理论的非高斯作用量。通过将本文形式体系应用于合成任务与MNIST数据集,我们获得了学习曲线的均匀核矩阵近似以及由数据变异性引起的修正项,从而能够估计泛化特性,并在训练数据点无穷多的情况下获得学习曲线边界的精确结果。