This work establishes new convergence guarantees for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.
翻译:本研究通过计算机辅助分析技术,为光滑凸优化中的梯度下降算法建立了新的收敛性保证。我们的理论允许采用非恒定步长策略,通过一次性分析多次迭代的整体效果(而非典型一阶方法分析中常用的单次迭代归纳),从而包含可能违反下降性的频繁长步长。我们证明了,尽管长步长可能在短期增加目标值,但长期来看可实现可证明的更快收敛。同时,本文还提出了一个旨在证明梯度下降达到更快$O(1/T\log T)$速率的猜想,并辅以简单的数值验证。