We study a continuous-time approximation of the stochastic gradient descent process for minimizing the population expected loss in learning problems. The main results establish general sufficient conditions for the convergence, extending the results of Chatterjee (2022) established for (nonstochastic) gradient descent. We show how the main result can be applied to the case of overparametrized neural network training.
翻译:本文研究了在最小化学习问题中总体期望损失的随机梯度下降过程的连续时间近似。主要结果建立了收敛的一般充分条件,扩展了Chatterjee(2022)针对(非随机)梯度下降所建立的结果。我们展示了如何将主要结果应用于过参数化神经网络训练的情形。