Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stability, i.e., to show that the stochastic vector iterates are bounded almost surely. In this paper, we extend the celebrated Borkar-Meyn theorem for stability from the Martingale difference noise setting to the Markovian noise setting, which greatly improves its applicability in reinforcement learning, especially in those off-policy reinforcement learning algorithms with linear function approximation and eligibility traces. Central to our analysis is the diminishing asymptotic rate of change of a few functions, which is implied by both a form of strong law of large numbers and a commonly used V4 Lyapunov drift condition and trivially holds if the Markov chain is finite and irreducible.
翻译:随机逼近是一类以迭代、增量、随机方式更新向量的算法,包括随机梯度下降和时间差分学习等。分析随机逼近算法的一个基本挑战在于确立其稳定性,即证明随机向量迭代几乎必然有界。本文将该领域著名的Borkar-Meyn稳定性定理从鞅差噪声场景扩展至马尔可夫噪声场景,显著提升了其在强化学习领域的适用性,尤其适用于采用线性函数逼近与资格迹技术的离策略强化学习算法。我们分析的核心在于若干函数渐近变化率的衰减特性,这一特性既可由某种形式的强大数定律推导,也可通过常用的V4李雅普诺夫漂移条件获得,且在马尔可夫链有限不可约时自然成立。