This work establishes provably faster convergence rates for gradient descent via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster $O(1/T\log T)$ rate for gradient descent is also motivated along with simple numerical validation.
翻译:本工作通过计算机辅助分析技术,建立了梯度下降法可证明的更快速收敛率。我们的理论允许非恒定步长策略,其中频繁的大步长可能违反下降性。这一分析通过同时考察多次迭代的整体效应,而非大多数一阶方法分析中使用的典型单步归纳方法。我们证明,尽管可能在短期内增加目标函数值,但大步长在长期内可实现可证明的更快收敛。此外,本文还提出了关于证明梯度下降法达到 $O(1/T\log T)$ 更快收敛率的猜想,并辅以简单的数值验证。