Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which is less than the number of data points, but then descends again in the overparameterized regime. In this paper, we use techniques from random matrix theory to characterize the spectral distribution of the empirical feature covariance matrix as a width-dependent perturbation of the spectrum of the neural network Gaussian process (NNGP) kernel, thus establishing a novel connection between the NNGP literature and the random matrix theory literature in the context of neural networks. Our analytical expression allows us to study the generalisation behavior of the corresponding kernel and GP regression, and provides a new interpretation of the double-descent phenomenon, namely as governed by the discrepancy between the width-dependent empirical kernel and the width-independent NNGP kernel.
翻译:神经网络中的双重下降曲线描述了以下现象:泛化误差最初随参数增加而下降,在达到最佳参数数量(该数量少于数据点数量)后上升,但在过参数化区域中再次下降。本文利用随机矩阵理论中的技术,将经验特征协方差矩阵的谱分布表征为神经网络高斯过程(NNGP)核谱的宽度相关扰动,从而在神经网络背景下建立了NNGP文献与随机矩阵理论文献之间的新颖联系。我们的解析表达式使我们能够研究对应核函数和高斯过程回归的泛化行为,并为双重下降现象提供了新的解释——即该现象由宽度相关的经验核与宽度无关的NNGP核之间的差异所主导。