This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised "mean-field" regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.
翻译:本文研究了训练于高斯数据上的两层神经网络在单次随机梯度下降(SGD)动力学中的表现,其中标签由相似但未必相同的目标函数生成。我们通过总体风险充分统计量的确定性低维描述,严格分析了极限动力学。我们的统一分析方法衔接了多个重要区域,包括经典梯度流区域(学习率趋于零)、高维区域(输入维度大)以及过度参数化的"平均场"区域(网络宽度大),同时覆盖了这些行为相互作用的中间区域。特别地,在高维极限下,无限宽度网络的动力学仍保持在由目标主方向张成的低维子空间附近。因此,我们的结果为合成数据下的极限SGD动力学提供了统一图景。