We prove that various stochastic gradient descent methods, including the stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold. To the best of our knowledge, this is the first time such results are obtained for SHB and SNAG methods. Moreover, our analysis expands upon previous studies on SGD by removing the need for bounded gradients of the objective function and uniformly bounded noise. Instead, we introduce a more practical local boundedness assumption for the noisy gradient, which is naturally satisfied in empirical risk minimization problems typically seen in training of neural networks.
翻译:本文证明多种随机梯度下降方法,包括随机梯度下降(SGD)、随机动量法(SHB)和随机涅斯特罗夫加速梯度法(SNAG),几乎必然避让任意严格鞍流形。据我们所知,这是首次针对SHB和SNAG方法获得此类结果。此外,我们的分析扩展了先前关于SGD的研究,去除了对目标函数有界梯度和一致有界噪声的要求。作为替代,我们引入了一种更实际的局部有界性假设针对噪声梯度,该假设在神经网络训练中常见的经验风险最小化问题中自然成立。