Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove linear convergence for smooth and strongly convex functions and $O(1/T)$ convergence for smooth and convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness - H\"older smoothness - that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up.
翻译:分数阶导数是整数阶导数的一种经过深入研究的推广形式。自然地,在优化领域,理解使用分数阶导数的梯度下降的收敛性质具有重要意义。目前,分数梯度下降的收敛性分析在所分析的方法和场景方面均存在局限。本文旨在通过分析分数梯度下降在光滑凸、光滑强凸以及光滑非凸场景下的变体,以填补这些空白。首先,将建立连接分数阶导数与整数阶导数的新颖界。随后,将这些界应用于前述场景,以证明光滑强凸函数的线性收敛性以及光滑凸函数的$O(1/T)$收敛性。此外,我们利用对分数阶导数更为自然的扩展光滑性概念——H\"older光滑性,证明了光滑非凸函数的$O(1/T)$收敛性。最后,将展示分数梯度下降相较于标准梯度下降可能带来的加速效果的实证结果,以及解释这种加速的一些初步理论结果。