Characterizing and understanding the dynamics of stochastic gradient descent (SGD) around saddle points remains an open problem. We first show that saddle points in neural networks can be divided into two types, among which the Type-II saddles are especially difficult to escape from because the gradient noise vanishes at the saddle. The dynamics of SGD around these saddles are thus to leading order described by a random matrix product process, and it is thus natural to study the dynamics of SGD around these saddles using the notion of probabilistic stability and the related Lyapunov exponent. Theoretically, we link the study of SGD dynamics to well-known concepts in ergodic theory, which we leverage to show that saddle points can be either attractive or repulsive for SGD, and its dynamics can be classified into four different phases, depending on the signal-to-noise ratio in the gradient close to the saddle.
翻译:描述和理解随机梯度下降(SGD)在鞍点附近的动力学行为仍然是一个开放性问题。我们首先证明神经网络中的鞍点可以分为两类,其中II型鞍点尤其难以逃离,因为梯度噪声在鞍点处消失。因此,SGD在这些鞍点附近的动力学行为在主导阶上由一个随机矩阵乘积过程描述,从而自然地引入概率稳定性的概念及相关李雅普诺夫指数来研究SGD在这些鞍点附近的动力学。理论上,我们将SGD动力学的研究与遍历理论中的著名概念联系起来,并利用这些概念证明鞍点对SGD既可以是吸引的也可以是排斥的,其动力学行为可根据鞍点附近梯度的信噪比划分为四个不同的相。