Existing analysis of AdaGrad and other adaptive methods for smooth convex optimization is typically for functions with bounded domain diameter. In unconstrained problems, previous works guarantee an asymptotic convergence rate without an explicit constant factor that holds true for the entire function class. Furthermore, in the stochastic setting, only a modified version of AdaGrad, different from the one commonly used in practice, in which the latest gradient is not used to update the stepsize, has been analyzed. Our paper aims at bridging these gaps and developing a deeper understanding of AdaGrad and its variants in the standard setting of smooth convex functions as well as the more general setting of quasar convex functions. First, we demonstrate new techniques to explicitly bound the convergence rate of the vanilla AdaGrad for unconstrained problems in both deterministic and stochastic settings. Second, we propose a variant of AdaGrad for which we can show the convergence of the last iterate, instead of the average iterate. Finally, we give new accelerated adaptive algorithms and their convergence guarantee in the deterministic setting with explicit dependency on the problem parameters, improving upon the asymptotic rate shown in previous works.
翻译:现有针对光滑凸优化的 AdaGrad 及其他自适应方法的分析通常假设函数具有有界定义域直径。在无约束问题中,先前工作保证的渐近收敛速率缺乏对整个函数类成立的显式常数因子。此外,在随机设定下,仅分析了 AdaGrad 的修改版本(与实践中常用的版本不同,该版本不利用最新梯度更新步长)。本文旨在弥合这些差距,并深化对 AdaGrad 及其变体在标准光滑凸函数以及更一般的拟凸函数设定下的理解。首先,我们展示了在确定性和随机设定下,显式约束原始 AdaGrad 在无约束问题中收敛速率的新技术。其次,我们提出了一种 AdaGrad 变体,并证明了其最后迭代(而非平均迭代)的收敛性。最后,我们给出了新的加速自适应算法及其在确定性设定下的收敛保证,该保证显式依赖问题参数,改进了先前工作中显示的渐近速率。