Classical analysis of convex and non-convex optimization methods often requires the Lipshitzness of the gradient, which limits the analysis to functions bounded by quadratics. Recent work relaxed this requirement to a non-uniform smoothness condition with the Hessian norm bounded by an affine function of the gradient norm, and proved convergence in the non-convex setting via gradient clipping, assuming bounded noise. In this paper, we further generalize this non-uniform smoothness condition and develop a simple, yet powerful analysis technique that bounds the gradients along the trajectory, thereby leading to stronger results for both convex and non-convex optimization problems. In particular, we obtain the classical convergence rates for (stochastic) gradient descent and Nesterov's accelerated gradient method in the convex and/or non-convex setting under this general smoothness condition. The new analysis approach does not require gradient clipping and allows heavy-tailed noise with bounded variance in the stochastic setting.
翻译:凸优化与非凸优化方法的经典分析通常要求梯度满足Lipschitz连续性,这使得分析局限于被二次函数所界定的函数。近期研究将该条件放宽至非均匀光滑性条件,即Hessian范数由梯度范数的仿射函数界定,并在假设有界噪声的情况下,通过梯度裁剪证明了非凸场景下的收敛性。本文进一步推广了这一非均匀光滑性条件,并发展了一种简洁而强大的分析技术,用于约束轨迹上的梯度,从而在凸优化与非凸优化问题中获得更强的结论。具体而言,在此广义光滑性条件下,我们得到了(随机)梯度下降法和Nesterov加速梯度法在凸和/或非凸场景中的经典收敛速率。该新分析方法无需梯度裁剪,且允许随机场景中方差有界的重尾噪声。