We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We derive a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We contrast these Bayes-optimal errors with the test errors of ridge regression, kernel and random features regression. We find, in particular, that optimally regularized ridge regression, as well as kernel regression, achieve Bayes-optimal performances, while the logistic loss yields a near-optimal test error for classification. We further show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
翻译:我们研究了与深度、宽范围、非线性且具有随机高斯权重的神经网络对应的目标函数的学习问题。我们考虑了样本数量、输入维度和网络宽度成比例大的渐近极限。我们推导出了贝叶斯最优测试误差的闭式表达式,适用于回归和分类任务。我们将这些贝叶斯最优误差与岭回归、核回归以及随机特征回归的测试误差进行了对比。特别地,我们发现最优正则化的岭回归以及核回归能够达到贝叶斯最优性能,而对数损失在分类任务中能产生接近最优的测试误差。我们进一步通过数值实验表明,当样本数量增长速度超过维度时,岭回归和核方法变得次优,而神经网络从二次方数量的样本中即可实现接近零的测试误差。