Stochastic optimization has found wide applications in minimizing objective functions in machine learning, which motivates a lot of theoretical studies to understand its practical success. Most of existing studies focus on the convergence of optimization errors, while the generalization analysis of stochastic optimization is much lagging behind. This is especially the case for nonconvex and nonsmooth problems often encountered in practice. In this paper, we initialize a systematic stability and generalization analysis of stochastic optimization on nonconvex and nonsmooth problems. We introduce novel algorithmic stability measures and establish their quantitative connection on the gap between population gradients and empirical gradients, which is then further extended to study the gap between the Moreau envelope of the empirical risk and that of the population risk. To our knowledge, these quantitative connection between stability and generalization in terms of either gradients or Moreau envelopes have not been studied in the literature. We introduce a class of sampling-determined algorithms, for which we develop bounds for three stability measures. Finally, we apply these discussions to derive error bounds for stochastic gradient descent and its adaptive variant, where we show how to achieve an implicit regularization by tuning the step sizes and the number of iterations.
翻译:随机优化在机器学习中广泛应用于最小化目标函数,这推动了大量理论研究以理解其实际成功。现有研究大多关注优化误差的收敛性,而随机优化的泛化分析却相对滞后,尤其在实践中常见的非凸非光滑问题中更是如此。本文首次系统性地对非凸非光滑问题上的随机优化进行了稳定性与泛化分析。我们引入了新的算法稳定性度量,并建立了其与总体梯度与经验梯度差距之间的定量联系,进而将该联系拓展至研究经验风险的Moreau包络与总体风险的Moreau包络之间的差异。据我们所知,这种基于梯度或Moreau包络的稳定性与泛化之间的定量联系尚未在文献中被研究。我们定义了一类采样确定型算法,并为其推导了三种稳定性度量的界。最后,我们将这些讨论应用于随机梯度下降及其自适应变体的误差界推导,展示了如何通过调整步长和迭代次数实现隐式正则化。