Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
翻译:随机逼近是一类迭代、增量且随机更新向量的算法,包括随机梯度下降和时间差分学习等。分析随机逼近算法的一个基本挑战在于建立其稳定性,即证明随机向量迭代过程几乎必然有界。本文中,我们将著名的Borkar-Meyn定理从鞅差噪声场景扩展到马尔可夫噪声场景,这极大地提升了其在强化学习中的适用性,尤其适用于那些采用线性函数逼近和资格迹的离线策略强化学习算法。我们分析的核心在于部分函数渐近变化率的递减特性,这一特性既由强大数定律的一种形式隐含,又可由常用的V4李雅普诺夫漂移条件推得,且当马尔可夫链有限且不可约时平凡成立。