Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove $O(1/T)$ convergence for smooth and convex functions and linear convergence for smooth and strongly convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.
翻译:分数阶导数是整数阶导数的经典推广。自然地,在优化领域,理解使用分数阶导数的梯度下降法的收敛性质具有重要意义。目前,分数阶梯度下降的收敛性分析在方法和设定两方面均存在局限性。本文旨在通过分析分数阶梯度下降在光滑凸、光滑强凸及光滑非凸设定下的变体来填补这些空白。首先,建立连接分数阶与整数阶导数的新界限。随后,将这些界限应用于上述设定,证明光滑凸函数的$O(1/T)$收敛速度以及光滑强凸函数的线性收敛速度。此外,利用一种更适用于分数阶导数的扩展光滑性概念,我们还证明了光滑非凸函数的$O(1/T)$收敛性。最后,通过实证结果展示分数阶梯度下降相较于标准梯度下降的潜在加速效果,并探讨预测两者中谁更快的普遍挑战。