Surprisingly, recent work has shown that gradient descent can be accelerated without using momentum -- just by judiciously choosing stepsizes. An open question raised by several papers is whether this phenomenon of stepsize-based acceleration holds more generally for constrained and/or composite convex optimization via projected and/or proximal versions of gradient descent. We answer this in the affirmative by proving that the silver stepsize schedule yields analogously accelerated rates in these settings. These rates are conjectured to be asymptotically optimal among all stepsize schedules, and match the silver convergence rate of vanilla gradient descent (Altschuler and Parrilo, 2023), namely $O(\varepsilon^{- \log_{\rho} 2})$ for smooth convex optimization and $O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$ under strong convexity, where $\varepsilon$ is the precision, $\kappa$ is the condition number, and $\rho = 1 + \sqrt{2}$ is the silver ratio. The key technical insight is the combination of recursive gluing -- the technique underlying all analyses of gradient descent accelerated with time-varying stepsizes -- with a certain Laplacian-structured sum-of-squares certificate for the analysis of proximal point updates.
翻译:令人惊讶的是,近期研究表明,无需使用动量项——仅通过审慎选择步长即可实现梯度下降的加速。多篇论文提出的一个开放性问题在于:这种基于步长的加速现象是否能够更普遍地适用于通过投影和/或近端版本的梯度下降法进行约束和/或复合凸优化。我们通过证明银步长调度方案在这些设定下能产生类似的加速速率,对此给出了肯定回答。这些速率被推测为所有步长调度方案中渐近最优的,并与经典梯度下降的银收敛速率(Altschuler与Parrilo,2023)相匹配,即在光滑凸优化中为$O(\varepsilon^{- \log_{\rho} 2})$,在强凸条件下为$O(\kappa^{\log_\rho 2} \log \frac{1}{\varepsilon})$,其中$\varepsilon$为精度,$\kappa$为条件数,$\rho = 1 + \sqrt{2}$为白银比例。关键的技术洞见在于将递归粘合技术——该技术是分析时变步长加速梯度下降的基础——与一种用于分析近端点更新的具有拉普拉斯结构的平方和证明相结合。