Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's Lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as Stochastic Gradient Descent (SGD) and Random Reshuffling (RR), under a general $(θ,μ)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes exhibit superior adaptivity to both landscape geometry and gradient noise; specifically, they achieve optimal convergence rates without requiring exact knowledge of the underlying landscape or separate parameter selection strategies for noisy and noise-free regimes. Our results demonstrate that the developed variant of Chung's Lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.
翻译:Chung引理是经典工具,用于在强凸性类型假设和适当的多项式递减步长条件下,建立(随机)优化方法的渐近收敛速率。本文提出了Chung引理的一个广义版本,为更一般的步长规则族提供了一个简洁的非渐近收敛分析框架。通过为多种随机方法推导出紧致的非渐近收敛速率,我们证明了所提出的广义引理具有广泛的适用性。特别地,在一般的$(θ,μ)$-Polyak-Lojasiewicz(PL)条件下,针对多项式、常数、指数及余弦等多种步长策略,我们获得了随机优化方法(如随机梯度下降(SGD)和随机重排(RR))的部分新的非渐近复杂度结果。值得注意的是,作为分析的副产品,我们观察到指数步长对问题几何结构和梯度噪声均表现出优异的自适应性;具体而言,它们无需精确了解底层问题几何结构,也无需为含噪与无噪情形分别设计参数选择策略,即可达到最优收敛速率。我们的结果表明,所发展的Chung引理变体为在一般步长规则下建立非渐近收敛速率,提供了一种通用、系统且简化的分析方法。