Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:摘要:针对平滑凸优化的AdaGrad及其他自适应方法的现有分析通常针对有界域直径的函数。在无约束问题中,先前的研究保证了渐近收敛速率,但缺乏适用于整个函数类且显式包含常数因子的结果。此外,在随机场景下,仅有与实践中常用版本不同的修改版AdaGrad被分析——该版本未使用最新梯度更新步长。本文旨在弥合这些空白,并深化对标准平滑凸函数及更一般的类凸函数(quasar convex)设定下AdaGrad及其变体的理解。首先,我们提出新技术,显式界定了原始AdaGrad在确定性与随机无约束问题中的收敛速率。其次,我们提出一种AdaGrad变体,可证明其末次迭代(而非平均迭代)的收敛性。最后,我们给出新的加速自适应算法及其在确定性设定下的收敛保证,该保证与问题参数显式相关,较先前研究中的渐近速率有显著提升。