Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
翻译:随机逼近是一类迭代、增量式、随机更新向量的算法,包括随机梯度下降法和时序差分学习等。分析随机逼近算法的一个基本挑战在于建立其稳定性,即证明随机向量迭代几乎必然有界。本文将有名的Borkar-Meyn定理从鞅差噪声场景推广至马尔可夫噪声场景,显著提升了其在强化学习中的适用性,尤其适用于采用线性函数逼近和资格迹的离策略强化学习算法。我们分析的核心在于若干函数渐近变化率的递减性,该性质既可由大数定律的一种形式推出,也可由常用的V4李雅普诺夫漂移条件蕴含,且当马尔可夫链有限且不可约时该性质显然成立。