We study the high-dimensional training dynamics of a shallow neural network with quadratic activation in a teacher-student setup. We focus on the extensive-width regime, where the teacher and student network widths scale proportionally with the input dimension, and the sample size grows quadratically. This scaling aims to describe overparameterized neural networks in which feature learning still plays a central role. In the high-dimensional limit, we derive a dynamical characterization of the gradient flow, in the spirit of dynamical mean-field theory (DMFT). Under l2-regularization, we analyze these equations at long times and characterize the performance and spectral properties of the resulting estimator. This result provides a quantitative understanding of the effect of overparameterization on learning and generalization, and reveals a double descent phenomenon in the presence of label noise, where generalization improves beyond interpolation. In the small regularization limit, we obtain an exact expression for the perfect recovery threshold as a function of the network widths, providing a precise characterization of how overparameterization influences recovery.
翻译:本文研究了一个具有二次激活函数的浅层神经网络在教师-学生框架下的高维训练动力学。我们聚焦于扩展宽度机制,其中教师网络与学生网络的宽度与输入维度成比例缩放,且样本量呈二次增长。该缩放机制旨在描述特征学习仍起核心作用的过参数化神经网络。在高维极限下,我们以动力学平均场理论(DMFT)的思路推导出梯度流的动力学特征。在l2正则化条件下,我们分析了这些方程在长时间尺度下的行为,并刻画了所得估计器的性能与谱特性。该结果为理解过参数化对学习与泛化的影响提供了量化依据,并揭示了在标签噪声存在情况下出现的双下降现象——泛化性能在超越插值点后得到改善。在小正则化极限下,我们得到了完美恢复阈值关于网络宽度的精确表达式,从而精确刻画了过参数化如何影响恢复性能。