We study the local convergence rate of stochastic first-order methods under a local $α$-Polyak-Lojasiewicz ($α$-PL) condition in a neighborhood of a target connected component $\mathcal{M}$ of the local minimizer set. The parameter $α\in [1,2]$ is the exponent of the gradient norm in the $α$-PL inequality: $α=2$ recovers the classical PL case, $α=1$ corresponds to Holder-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output $\hat{x}$ with $F(\hat{x})-l \le \varepsilon$, where $l := F(y)$ for any $y \in \mathcal{M}$. We work in a local regime where the algorithm is initialized near $\mathcal{M}$ and, with high probability, its iterates remain in that neighborhood. We establish a lower bound $Ω(\varepsilon^{-2/α})$ for all stochastic first-order methods in this regime, and we obtain a matching upper bound $\mathcal{O}(\varepsilon^{-2/α})$ for $1 \le α< 2$ via a SARAH-type variance-reduced method with time-varying batch sizes and step sizes. In the convex setting, assuming a local $α$-PL condition on the $\varepsilon$-sublevel set, we further show a complexity lower bound $\widetildeΩ(\varepsilon^{-2/α})$ for reaching an $\varepsilon$-global optimum, matching the $\varepsilon$-dependence of known accelerated stochastic subgradient methods.
翻译:我们研究了随机一阶方法在局部极小值集目标连通分量 $\mathcal{M}$ 邻域内,满足局部 $α$-Polyak-Lojasiewicz ($α$-PL) 条件下的局部收敛速率。参数 $α\in [1,2]$ 是 $α$-PL 不等式中梯度范数的指数:$α=2$ 对应经典的 PL 情形,$α=1$ 对应于 Hölder 型误差界,中间值则在这些机制之间插值。我们的性能标准是输出满足 $F(\hat{x})-l \le \varepsilon$ 的 $\hat{x}$ 所需的 oracle 查询次数,其中 $l := F(y)$ 对任意 $y \in \mathcal{M}$ 成立。我们在局部机制下工作,即算法在 $\mathcal{M}$ 附近初始化,并且以高概率其迭代点保持在该邻域内。我们为该机制下的所有随机一阶方法建立了下界 $Ω(\varepsilon^{-2/α})$,并通过采用时变批大小与步长的 SARAH 型方差缩减方法,对 $1 \le α< 2$ 情形获得了匹配的上界 $\mathcal{O}(\varepsilon^{-2/α})$。在凸设定下,若假设 $\varepsilon$-次水平集上满足局部 $α$-PL 条件,我们进一步证明了达到 $\varepsilon$-全局最优点的复杂度下界 $\widetildeΩ(\varepsilon^{-2/α})$,这与已知加速随机次梯度方法的 $\varepsilon$ 依赖关系相匹配。