Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.
翻译:经典凸与非凸优化方法分析通常要求梯度的Lipschitz连续性,这限制了分析仅适用于二次函数界定的目标。近期研究将这一要求放宽到非均匀光滑条件,即海森矩阵范数由梯度范数的仿射函数界定,并在有界噪声假设下通过梯度裁剪证明了非凸场景下的收敛性。本文进一步推广了该非均匀光滑条件,并提出一种简单而强大的分析技术,通过沿轨迹界定梯度范数,从而为凸与非凸优化问题导出更强结果。特别地,在该广义光滑条件下,我们获得了(随机)梯度下降法和Nesterov加速梯度法在凸和/或非凸场景中的经典收敛速率。该新分析方法无需梯度裁剪,且允许随机场景中方差有界的重尾噪声。