For overparameterized optimization tasks, such as those found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent that depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.
翻译:对于现代机器学习中的过参数化优化任务,全局最小值通常不唯一。为理解此类场景下的泛化性能,研究优化算法收敛至何种最小值至关重要。在优化算法所施加的动态下,存在不稳定最小值的可能性限制了算法可能找到的潜在最小值。本文刻画了确定性梯度下降与随机梯度下降(SGD)中动态稳定/不稳定的全局最小值。特别地,我们引入了一个依赖于全局最小值局部动态的特征李雅普诺夫指数,并严格证明了该指数的符号决定了SGD是否能在相应全局最小值处累积。