We provide an overview of high dimensional dynamical systems driven by random matrices, focusing on applications to simple models of learning and generalization in machine learning theory. Using both cavity method arguments and path integrals, we review how the behavior of a coupled infinite dimensional system can be characterized as a stochastic process for each single site of the system. We provide a pedagogical treatment of dynamical mean field theory (DMFT), a framework that can be flexibly applied to these settings. The DMFT single site stochastic process is fully characterized by a set of (two-time) correlation and response functions. For linear time-invariant systems, we illustrate connections between random matrix resolvents and the DMFT response. We demonstrate applications of these ideas to machine learning models such as gradient flow, stochastic gradient descent on random feature models and deep linear networks in the feature learning regime trained on random data. We demonstrate how bias and variance decompositions (analysis of ensembling/bagging etc) can be computed by averaging over subsets of the DMFT noise variables. From our formalism we also investigate how linear systems driven with random non-Hermitian matrices (such as random feature models) can exhibit non-monotonic loss curves with training time, while Hermitian matrices with the matching spectra do not, highlighting a different mechanism for non-monotonicity than small eigenvalues causing instability to label noise. Lastly, we provide asymptotic descriptions of the training and test loss dynamics for randomly initialized deep linear neural networks trained in the feature learning regime with high-dimensional random data. In this case, the time translation invariance structure is lost and the hidden layer weights are characterized as spiked random matrices.
翻译:本文综述了由随机矩阵驱动的高维动力系统,重点关注其在机器学习理论中学习与泛化简单模型的应用。通过运用空腔法论证和路径积分方法,我们回顾了如何将耦合无限维系统的行为表征为系统中每个单点的随机过程。我们对动力学平均场理论(DMFT)进行了教学式阐述,该框架可灵活应用于这些场景。DMFT单点随机过程完全由一组(双时间)关联函数和响应函数所刻画。对于线性时不变系统,我们阐明了随机矩阵预解式与DMFT响应之间的联系。我们展示了这些思想在机器学习模型中的应用,例如梯度流、随机特征模型上的随机梯度下降,以及在随机数据上训练的特征学习机制下的深度线性网络。我们论证了如何通过对DMFT噪声变量的子集进行平均来计算偏差与方差分解(集成/装袋等分析)。基于我们的形式体系,我们还研究了由随机非厄米矩阵(如随机特征模型)驱动的线性系统如何表现出训练损失随训练时间非单调变化的曲线,而具有匹配谱的厄米矩阵则不会,这揭示了一种不同于小特征值导致对标签噪声敏感性的非单调性机制。最后,我们为在高维随机数据上训练、处于特征学习机制的随机初始化深度线性神经网络,提供了训练和测试损失动态的渐近描述。在此情况下,时间平移不变性结构消失,且隐藏层权重被表征为尖峰随机矩阵。