The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.
翻译:Robbins-Siegmund定理确立了几乎超鞅的随机过程的收敛性,是分析随机逼近和强化学习(RL)中各类随机迭代算法的基础。然而,其原始形式存在一个显著局限,即要求零阶项是可求和的。在许多重要的强化学习应用中,这一可求和条件往往无法满足。这一局限性促使我们将Robbins-Siegmund定理扩展至零阶项不可求和但仅平方可求和的几乎超鞅情形。特别地,我们针对随机过程的增量引入了一个新颖且温和的假设。该假设与平方可和条件相结合,能够保证几乎必然收敛到一个有界集。此外,我们还进一步提供了几乎必然收敛速率、高概率集中界以及$L^p$收敛速率。随后,我们将新结果应用于随机逼近和强化学习领域。值得注意的是,我们首次获得了带线性函数逼近的$Q$-学习的几乎必然收敛速率、高概率集中界以及$L^p$收敛速率。