Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove linear convergence for smooth and strongly convex functions and $O(1/T)$ convergence for smooth and convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness - H\"older smoothness - that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.
翻译:分数阶导数是整数阶导数的一种被广泛研究的推广形式。在优化领域,理解使用分数阶导数的梯度下降法的收敛性质自然具有重要的学术价值。目前,分数阶梯度下降的收敛性分析在分析方法及分析场景两个层面均存在局限性。本文旨在通过分析光滑凸函数、光滑强凸函数及光滑非凸函数场景下的分数阶梯度下降变体来填补这些空白。首先,我们将建立连接分数阶与整数阶导数的新型边界;随后,将这些边界应用于前述场景,证明光滑强凸函数的线性收敛性以及光滑凸函数的$O(1/T)$收敛性。此外,针对光滑非凸函数,我们利用对分数阶导数更为自然的扩展光滑性概念——Hölder光滑性——证明了$O(1/T)$收敛性。最后,将呈现关于分数阶梯度下降相较于标准梯度下降潜在加速效果的实证结果,并探讨在一般情况下预测何种方法更快的挑战。