Shuffling gradient methods are widely used in modern machine learning tasks and include three popular implementations: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understood for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
翻译:洗牌梯度方法广泛应用于现代机器学习任务中,包含三种主流实现:随机重洗(RR)、单次洗牌(SO)和增量梯度(IG)。与其在实证中的成功相比,洗牌梯度方法的理论保证长期以来未能得到充分理解。直到近期,才针对凸函数的平均迭代和强凸问题(以平方距离为度量)的最后迭代建立了收敛速率。然而,当采用函数值差距作为收敛准则时,现有理论无法解释不同场景(如约束优化)中最后迭代的良好表现。为弥合这一实践与理论之间的鸿沟,我们首次证明了洗牌梯度方法在目标函数值意义上的最后迭代收敛速率,且无需强凸性假设。我们的新结果或(近乎)匹配现有最后迭代下界,或与先前平均迭代的最佳上界相当。