Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grows with respect to the gradient norm along the training trajectory. Motivated by this phenomenon, the recently introduced $(L_0, L_1)$-smoothness is a more general notion, compared to traditional $L$-smoothness, that captures such positive relationship between smoothness and gradient norm. Under this type of non-uniform smoothness, existing literature has designed stochastic first-order algorithms by utilizing gradient clipping techniques to obtain the optimal $\mathcal{O}(\epsilon^{-3})$ sample complexity for finding an $\epsilon$-approximate first-order stationary solution. Nevertheless, the studies of quasi-Newton methods are still lacking. Considering higher accuracy and more robustness for quasi-Newton methods, in this paper we propose a fast stochastic quasi-Newton method when there exists non-uniformity in smoothness. Leveraging gradient clipping and variance reduction, our algorithm can achieve the best-known $\mathcal{O}(\epsilon^{-3})$ sample complexity and enjoys convergence speedup with simple hyperparameter tuning. Our numerical experiments show that our proposed algorithm outperforms the state-of-the-art approaches.
翻译:经典优化算法的收敛性分析依赖于广泛采用的均匀光滑性假设。然而,最近的实验研究表明,许多机器学习问题表现出非均匀光滑性,即光滑因子是模型参数的函数而非通用常数。特别地,已观察到光滑性随训练轨迹中梯度范数的增大而增长。受此现象启发,相较于传统的$L$-光滑性,近期提出的$(L_0, L_1)$-光滑性是一种更广义的概念,它捕捉了光滑性与梯度范数之间的这种正向关系。在此类非均匀光滑性条件下,现有文献通过采用梯度裁剪技术设计了随机一阶算法,以获得寻找$\epsilon$-近似一阶稳定解的最优$\mathcal{O}(\epsilon^{-3})$样本复杂度。然而,拟牛顿方法的研究仍属空白。考虑到拟牛顿方法具有更高的精度和更强的鲁棒性,本文针对光滑性存在非均匀性的情况提出了一种快速随机拟牛顿方法。通过结合梯度裁剪与方差缩减技术,我们的算法能够达到目前已知最优的$\mathcal{O}(\epsilon^{-3})$样本复杂度,并借助简单的超参数调整实现收敛加速。数值实验表明,我们提出的算法优于现有最先进的方法。