Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove linear convergence for smooth and strongly convex functions and $O(1/T)$ convergence for smooth and convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness - H\"older smoothness - that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up.
翻译:分数阶导数是整数阶导数的经典推广。在优化问题中,理解使用分数阶导数的梯度下降法的收敛性质具有重要意义。目前,关于分数阶梯度下降的收敛性分析在方法类型和设定场景两方面均存在局限性。本文旨在通过分析光滑凸函数、光滑强凸函数以及光滑非凸函数三种设定下的分数阶梯度下降变体来填补这些空白。首先,将建立连接分数阶与整数阶导数的全新界值;其次,将这些界值应用于上述设定,证明光滑强凸函数的线性收敛性以及光滑凸函数的$O(1/T)$收敛速率。此外,通过引入更契合分数阶导数的扩展光滑性概念——Hölder光滑性——我们证明了光滑非凸函数同样具有$O(1/T)$收敛速率。最后,将展示分数阶梯度下降相较于标准梯度下降潜在加速效果的实证结果,并给出解释该加速现象的初步理论结果。