Bayesian Interpolation with Deep Linear Networks

Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory. We give here a complete solution in the special case of linear networks with output dimension one trained using zero noise Bayesian inference with Gaussian weight priors and mean squared error as a negative log-likelihood. For any training dataset, network depth, and hidden layer widths, we find non-asymptotic expressions for the predictive posterior and Bayesian model evidence in terms of Meijer-G functions, a class of meromorphic special functions of a single complex variable. Through novel asymptotic expansions of these Meijer-G functions, a rich new picture of the joint role of depth, width, and dataset size emerges. We show that linear networks make provably optimal predictions at infinite depth: the posterior of infinitely deep linear networks with data-agnostic priors is the same as that of shallow networks with evidence-maximizing data-dependent priors. This yields a principled reason to prefer deeper networks when priors are forced to be data-agnostic. Moreover, we show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth, elucidating the salutary role of increased depth for model selection. Underpinning our results is a novel emergent notion of effective depth, given by the number of hidden layers times the number of data points divided by the network width; this determines the structure of the posterior in the large-data limit.

翻译：刻画神经网络深度、宽度与数据集规模如何共同影响模型质量，是深度学习理论的核心问题。本文针对输出维度为一的线性网络这一特例给出完整解答，该网络采用零噪声贝叶斯推断，以高斯权重先验和均方误差作为负对数似然。对于任意训练数据集、网络深度及隐藏层宽度，我们以Meijer-G函数（一类单复变量的亚纯特殊函数）形式给出了预测后验和贝叶斯模型证据的非渐近表达式。通过对这些Meijer-G函数的新颖渐近展开，深度、宽度与数据集规模的联合作用呈现出丰富的全新图景。我们证明线性网络在无限深度下可做出可证明的最优预测：具有数据无关先验的无限深线性网络的后验，与采用证据最大化数据相关先验的浅层网络的后验相同。这为先验被迫为数据无关时偏好更深网络提供了原理性依据。此外，我们表明在数据无关先验下，宽线性网络的贝叶斯模型证据在无限深度处达到最大值，揭示了增加深度对模型选择的积极作用。支撑我们结果的是一种新颖的有效深度概念，其定义为隐藏层数乘以数据点数再除以网络宽度；这一概念决定了大数据极限下的后验结构。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日