We present the first uniform-in-time high-probability bound for SGD under the PL condition, where the gradient noise contains both Markovian and martingale difference components. This significantly broadens the scope of finite-time guarantees, as the PL condition arises in many machine learning and deep learning models while Markovian noise naturally arises in decentralized optimization and online system identification problems. We further allow the magnitude of noise to grow with the function value, enabling the analysis of many practical sampling strategies. In addition to the high-probability guarantee, we establish a matching $1/k$ decay rate for the expected suboptimality. Our proof technique relies on the Poisson equation to handle the Markovian noise and a probabilistic induction argument to address the lack of almost-sure bounds on the objective. Finally, we demonstrate the applicability of our framework by analyzing three practical optimization problems: token-based decentralized linear regression, supervised learning with subsampling for privacy amplification, and online system identification.
翻译:本文首次在PL条件下为SGD建立了具有马尔可夫分量和鞅差分分量的梯度噪声情形下的时间一致高概率收敛界。这显著扩展了有限时间保证的适用范围,因为PL条件广泛存在于机器学习和深度学习模型中,而马尔可夫噪声则自然出现在去中心化优化和在线系统辨识问题中。我们进一步允许噪声幅度随函数值增长,从而能够分析多种实际采样策略。除高概率保证外,我们还建立了期望次优性的匹配$1/k$衰减率。证明技术依赖于泊松方程处理马尔可夫噪声,并采用概率归纳论证以应对目标函数缺乏几乎必然有界性的问题。最后,通过分析三个实际优化问题验证了本框架的适用性:基于令牌的去中心化线性回归、用于隐私增强的子采样监督学习以及在线系统辨识。