We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.
翻译:我们为一类基于一阶梯度的方法推导了精确高维渐近的闭式方程,该方法通过经验风险最小化从高斯数据观测中学习估计量(例如M估计量、浅层神经网络等)。这包括广泛使用的算法如随机梯度下降(SGD)或Nesterov加速。所获得的方程与统计物理中动力学平均场理论(DMFT)方程离散化后应用于梯度流的结果一致。我们的证明方法能够显式描述有效动力学中记忆核的建立过程,并涵盖不可分离的更新函数,从而允许数据具有非单位协方差矩阵。最后,我们提供了针对具有通用大批量大小和恒定学习率的SGD方程的数值实现。