This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.
翻译:本文研究了利用全连接网络学习随机高斯网络函数的问题,其中中间层被冻结而读出层可训练。该问题可视为广泛研究的随机特征模型向深层架构的自然推广。首先,我们证明了在岭回归设置中(学习器网络与目标网络共享相同中间层)测试误差的高斯普适性,并给出了其渐近精确公式。建立该结果需要证明深度随机特征样本协方差矩阵迹的确定性等价,这一结论可能具有独立研究价值。其次,我们推测在更一般的凸损失函数及任意学习器/目标架构设置下,测试误差仍具有渐近高斯普适性。通过推导层间激活后总体协方差的闭合表达式,我们为这一猜想提供了大量数值证据。基于研究结果,我们探讨了架构设计与隐式正则化之间的相互作用。