For overparameterized optimization tasks, such as the ones found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent which depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.
翻译:对于过参数化优化任务(例如现代机器学习中的常见任务),全局最小值通常不唯一。为理解此类设定下的泛化性能,研究优化算法收敛至何种最小值至关重要。在优化算法所施加的动力学框架下,存在不稳定极小值的可能性会限制算法可能找到的潜在极小值。本文针对确定性及随机梯度下降(SGD),系统刻画了动力学意义上稳定/不稳定的全局最小值。特别地,我们引入了一个依赖于全局最小值局部动力学特性的特征李雅普诺夫指数,并严格证明了该指数的符号决定了SGD能否在该全局最小值处累积。