We study the learning dynamics of a multi-pass, mini-batch Stochastic Gradient Descent (SGD) procedure for empirical risk minimization in high-dimensional multi-index models with isotropic random data. In an asymptotic regime where the sample size $n$ and data dimension $d$ increase proportionally, for any sub-linear batch size $κ\asymp n^α$ where $α\in [0,1)$, and for a commensurate ``critical'' scaling of the learning rate, we provide an asymptotically exact characterization of the coordinate-wise dynamics of SGD. This characterization takes the form of a system of dynamical mean-field equations, driven by a scalar Poisson jump process that represents the asymptotic limit of SGD sampling noise. We develop an analogous characterization of the Stochastic Modified Equation (SME) which provides a Gaussian diffusion approximation to SGD. Our analyses imply that the limiting dynamics for SGD are the same for any batch size scaling $α\in [0,1)$, and that under a commensurate scaling of the learning rate, dynamics of SGD, SME, and gradient flow are mutually distinct, with those of SGD and SME coinciding in the special case of a linear model. We recover a known dynamical mean-field characterization of gradient flow in a limit of small learning rate, and of one-pass/online SGD in a limit of increasing sample size $n/d \to \infty$.
翻译:我们研究了高维各向同性随机数据多指标模型中,用于经验风险最小化的多轮小批量随机梯度下降(SGD)过程的学习动力学。在样本量 $n$ 与数据维度 $d$ 成比例增长的渐近机制下,对于任意满足 $κ\asymp n^α$(其中 $α\in [0,1)$)的亚线性批量大小,以及相应“临界”学习率的缩放,我们给出了SGD坐标方向动力学的渐近精确刻画。该刻画以一组动态平均场方程的形式呈现,其驱动项为一个标量泊松跳跃过程,代表了SGD采样噪声的渐近极限。我们建立了随机修正方程(SME)的类似刻画,该方程为SGD提供了高斯扩散近似。我们的分析表明,对于任意批量大小缩放 $α\in [0,1)$,SGD的极限动力学是相同的;并且在相应的学习率缩放下,SGD、SME以及梯度流的动力学彼此不同,仅在特殊线性模型情形下,SGD与SME的动力学才会重合。我们在学习率趋于零的极限下,恢复了梯度流已知的动态平均场刻画;在样本量 $n/d \to \infty$ 的极限下,恢复了一轮/在线SGD的动态平均场刻画。