Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove $O(1/T)$ convergence for smooth and convex functions and linear convergence for smooth and strongly convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.
翻译:分数阶导数是整数阶导数的一种被广泛研究的推广形式。自然地,在优化领域中,理解使用分数阶导数的梯度下降法的收敛性质具有重要意义。目前,分数阶梯度下降的收敛性分析在分析方法与分析场景两方面均存在局限性。本文旨在通过分析分数阶梯度下降法在光滑凸函数、光滑强凸函数以及光滑非凸函数三种场景下的变体来填补这些空白。首先,我们将建立连接分数阶导数与整数阶导数的新型界值,随后将这些界值应用于前述场景,证明光滑凸函数的$O(1/T)$收敛性以及光滑强凸函数的线性收敛性。此外,通过引入一种更适用于分数阶导数的扩展光滑性概念,我们证明了光滑非凸函数同样具有$O(1/T)$收敛性。最后,本文将通过实证结果展示分数阶梯度下降法相对于标准梯度下降法的潜在加速效果,并指出预测两者孰优孰劣所面临的挑战。