Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that reproduces this phenomenon for gradient flow in a linear teacher--student setting. In this framework, learning occurs when an isolated eigenvalue separates from a noisy bulk, before eventually disappearing in the overfitting regime. The key ingredient is anisotropy in the input covariance, which induces fast and slow directions in the learning dynamics. In a two-block covariance model, we derive the full time-dependent bulk spectrum of the symmetrized weight matrix through a $2\times 2$ Dyson equation, and we obtain an explicit outlier condition for a rank-one teacher via a rank-two determinant formula. This yields a transient Baik-Ben Arous-Péché (BBP) transition: depending on signal strength and covariance anisotropy, the teacher spike may never emerge, emerge and persist, or emerge only during an intermediate time interval before being reabsorbed into the bulk. We map the corresponding phase diagrams and validate the theory against finite-size simulations. Our results provide a minimal solvable mechanism for early stopping as a transient spectral effect driven by anisotropy and noise.
翻译:对训练模型的实证研究通常报告一种瞬态阶段:在过拟合主导之前,信号在有限的梯度下降时间窗口内可检测。我们构建了一个可解析的随机矩阵模型,在线性教师-学生设定下再现了梯度流中的这一现象。在该框架中,学习发生在孤立特征值从噪声体分离之后、最终在过拟合阶段消失之前。关键因素在于输入协方差的各向异性,它在学习动力学中引入快慢方向。通过两区块协方差模型,我们利用$2\times 2$ Dyson方程推导了对称化权重矩阵的完整时变本征谱体,并通过秩二行列式公式获得了秩一教师的显式离群条件。由此产生瞬态Baik-Ben Arous-Péché(BBP)相变:取决于信号强度和协方差各向异性,教师尖峰可能永不出现、出现并持续,或仅在中间时间区间出现而后被重新吸收回体。我们绘制了相应的相图,并通过有限尺寸模拟验证理论。我们的结果为早停机制提供了一种可解析的最小模型,该机制源自各向异性和噪声驱动的瞬态谱效应。