Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:摘要:现有针对光滑凸优化的AdaGrad及其他自适应方法的分析通常假设函数定义域直径有界。在无约束问题中,前人工作仅保证了无显式常数因子的渐近收敛速率,且该结论对整个函数类成立。此外,在随机设定下,仅有与实践中常用版本不同的AdaGrad修正形式(其中最新梯度不用于更新步长)被分析过。本文旨在弥合这些差距,深入理解标准光滑凸函数及更一般的类凸函数设定下的AdaGrad及其变体。首先,我们提出了新方法,在确定性和随机设定下显式界定了原始AdaGrad在无约束问题中的收敛速率。其次,我们提出一种AdaGrad变体,可证明其末次迭代而非平均迭代的收敛性。最后,我们给出了新的加速自适应算法及其在确定性设定下的收敛保证,其中包含对问题参数的显式依赖,从而改进了前人工作中所示的渐近速率。