Shuffling gradient methods are widely implemented in practice, particularly including three popular algorithms: Random Reshuffle (RR), Shuffle Once (SO), and Incremental Gradient (IG). Compared to the empirical success, the theoretical guarantee of shuffling gradient methods was not well-understood for a long time. Until recently, the convergence rates had just been established for the average iterate for convex functions and the last iterate for strongly convex problems (using squared distance as the metric). However, when using the function value gap as the convergence criterion, existing theories cannot interpret the good performance of the last iterate in different settings (e.g., constrained optimization). To bridge this gap between practice and theory, we prove the first last-iterate convergence rates for shuffling gradient methods with respect to the objective value even without strong convexity. Our new results either (nearly) match the existing last-iterate lower bounds or are as fast as the previous best upper bounds for the average iterate.
翻译:洗牌梯度方法在实践中被广泛采用,尤其包括三种流行算法:随机重排(RR)、单次洗牌(SO)和增量梯度(IG)。相较于其经验上的成功,洗牌梯度方法的理论保证在很长时间内并未得到充分理解。直到最近,其收敛速率才在凸函数的平均迭代点以及强凸问题(使用平方距离作为度量)的末点处得以建立。然而,当采用函数值间隙作为收敛准则时,现有理论无法解释末点在不同设置(例如约束优化)下的良好性能。为了弥合实践与理论之间的这一差距,我们证明了即使在没有强凸性的情况下,洗牌梯度方法关于目标值的首次末点收敛速率。我们的新结果要么(近似)匹配现有的末点下界,要么与先前平均迭代点的最佳上界同样快速。