Fundamental limits of overparametrized shallow neural networks for supervised learning

We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.

翻译：我们对一个由匹配架构的教师网络生成的输入-输出对训练的双层神经网络进行信息论分析，研究其过参数化状态。我们的结果以界限形式呈现，涉及：(i)训练数据与网络权重之间的互信息，或(ii)贝叶斯最优泛化误差，与更简单（广义）线性模型对应量的关系——后者具有严格已知的显式表达式。这些界限以训练样本数、输入维度和隐藏单元数表示，从而揭示了根据我们的双层教师神经网络模型从有限数据训练时，任何神经网络（实际上任何学习过程）的基本性能极限。证明依赖于自旋玻璃的严格工具，并以近年来众多神经网络分析核心的“高斯等价原理”为指导。与现有文献（这些文献要么不严格，要么局限于仅学习读出权重的场景）相比，我们的结果具有信息论性质（即不依赖于特定学习算法），且重要的是，涵盖了所有网络参数均被训练的情况。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Nat. Biotechnol. | 机器学习为生物库驱动的药物发现提供动力

专知会员服务

11+阅读 · 2022年9月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日