For modern gradient-based optimization, a developmental landmark is Nesterov's accelerated gradient descent method, which is proposed in [Nesterov, 1983], so shorten as Nesterov-1983. Afterward, one of the important progresses is its proximal generalization, named the fast iterative shrinkage-thresholding algorithm (FISTA), which is widely used in image science and engineering. However, it is unknown whether both Nesterov-1983 and FISTA converge linearly on the strongly convex function, which has been listed as the open problem in the comprehensive review [Chambolle and Pock, 2016, Appendix B]. In this paper, we answer this question by the use of the high-resolution differential equation framework. Along with the phase-space representation previously adopted, the key difference here in constructing the Lyapunov function is that the coefficient of the kinetic energy varies with the iteration. Furthermore, we point out that the linear convergence of both the two algorithms above has no dependence on the parameter $r$ on the strongly convex function. Meanwhile, it is also obtained that the proximal subgradient norm converges linearly.
翻译:对于现代基于梯度的优化算法而言,Nesterov加速梯度下降法(由[Nesterov, 1983]提出,简称Nesterov-1983)是一个发展里程碑。此后,其重要进展之一为近端推广算法——快速迭代收缩阈值算法(FISTA),该算法广泛应用于图像科学与工程领域。然而,Nesterov-1983与FISTA是否在强凸函数上均能线性收敛,此前尚属未知,这一问题被列为综合综述[Chambolle and Pock, 2016, Appendix B]中的公开难题。本文通过采用高分辨率微分方程框架回答了该问题。结合先前使用的相空间表示法,本文构造Lyapunov函数的关键区别在于动能系数随迭代次数变化。此外,我们指出上述两种算法的线性收敛性与强凸函数参数$r$无关,同时近端次梯度范数的线性收敛性亦得以证明。