The empirical evidence indicates that stochastic optimization with heavy-tailed gradient noise is more appropriate to characterize the training of machine learning models than that with standard bounded gradient variance noise. Most existing works on this phenomenon focus on the convergence of optimization errors, while the analysis for generalization bounds under the heavy-tailed gradient noise remains limited. In this paper, we develop a general framework for establishing generalization bounds under heavy-tailed noise. Specifically, we introduce a truncation argument to achieve the generalization error bound based on the algorithmic stability under the assumption of bounded $p$th centered moment with $p\in(1,2]$. Building on this framework, we further provide the stability and generalization analysis for several popular stochastic algorithms under heavy-tailed noise, including clipped and normalized stochastic gradient descent, as well as their mini-batch and momentum variants.
翻译:经验证据表明,与具有标准有界梯度方差噪声的随机优化相比,具有重尾梯度噪声的随机优化更适合刻画机器学习模型的训练过程。现有关于该现象的研究大多聚焦于优化误差的收敛性分析,而在重尾梯度噪声下的泛化界分析仍较为有限。本文提出了一个在重尾噪声下建立泛化界的一般性框架。具体而言,我们在假设存在有界$p$阶中心矩($p\in(1,2]$)的条件下,通过引入截断论证方法,基于算法稳定性实现了泛化误差界的推导。基于该框架,我们进一步对重尾噪声下几种常用随机算法的稳定性与泛化性进行了分析,包括截断与归一化随机梯度下降法及其小批量与动量变体。