We consider the momentum stochastic gradient descent scheme (MSGD) and its continuous-in-time counterpart in the context of non-convex optimization. We show almost sure exponential convergence of the objective function value for target functions that are Lipschitz continuous and satisfy the Polyak-Lojasiewicz inequality on the relevant domain, and under assumptions on the stochastic noise that are motivated by overparameterized supervised learning applications. Moreover, we optimize the convergence rate over the set of friction parameters and show that the MSGD process almost surely converges.
翻译:我们考虑非凸优化背景下的动量随机梯度下降(MSGD)方案及其时间连续对应形式。针对在相关区域上满足Lipschitz连续性和Polyak-Lojasiewicz不等式的目标函数,并在受过度参数化监督学习应用启发的随机噪声假设下,我们证明了目标函数值几乎必然指数收敛。此外,我们优化了摩擦参数集上的收敛速率,并表明MSGD过程几乎必然收敛。