We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.
翻译:我们对一个由匹配架构的教师网络生成的输入-输出对训练的双层神经网络进行信息论分析,研究其过参数化状态。我们的结果以界限形式呈现,涉及:(i)训练数据与网络权重之间的互信息,或(ii)贝叶斯最优泛化误差,与更简单(广义)线性模型对应量的关系——后者具有严格已知的显式表达式。这些界限以训练样本数、输入维度和隐藏单元数表示,从而揭示了根据我们的双层教师神经网络模型从有限数据训练时,任何神经网络(实际上任何学习过程)的基本性能极限。证明依赖于自旋玻璃的严格工具,并以近年来众多神经网络分析核心的“高斯等价原理”为指导。与现有文献(这些文献要么不严格,要么局限于仅学习读出权重的场景)相比,我们的结果具有信息论性质(即不依赖于特定学习算法),且重要的是,涵盖了所有网络参数均被训练的情况。