Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.
翻译:随机优化在机器学习中广泛应用于目标函数的最小化,这激发了大量理论研究的兴趣,以理解其在实际中的成功。现有研究大多关注优化误差的收敛性,而随机优化的泛化分析则相对滞后,尤其对于实践中常遇到的非凸非光滑问题更是如此。本文针对非凸非光滑问题,首次系统性地开展了随机优化的稳定性和泛化性分析。我们引入了新的算法稳定性度量,并建立了它们关于种群梯度与经验梯度差距的定量联系,进而将其扩展至研究经验风险的莫罗包络与种群风险的莫罗包络之间的差距。据我们所知,这种基于梯度或莫罗包络的稳定性和泛化性之间的定量联系在文献中尚未被研究过。我们定义了一类采样决定型算法,并为其推导了三种稳定性度量的界。最后,我们应用这些讨论推导了随机梯度下降及其自适应变体的误差界,展示了如何通过调整步长和迭代次数实现隐式正则化。