This paper establishes the first almost sure convergence rate and the first maximal concentration bound with exponential tails for general contractive stochastic approximation algorithms with Markovian noise. As a corollary, we also obtain convergence rates in $L^p$. Key to our successes is a novel discretization of the mean ODE of stochastic approximation algorithms using intervals with diminishing (instead of constant) length. As applications, we provide the first almost sure convergence rate for $Q$-learning with Markovian samples without count-based learning rates. We also provide the first concentration bound for off-policy temporal difference learning with Markovian samples.
翻译:本文针对带马尔可夫噪声的一般压缩型随机逼近算法,首次建立了其几乎必然收敛速率以及具有指数尾的首个极大集中界。作为推论,我们还得到了 $L^p$ 范数下的收敛速率。我们成功的关键在于提出了一种新颖的离散化方法,该方法使用长度递减(而非恒定)的区间来离散化随机逼近算法的平均常微分方程。作为应用,我们首次给出了在马尔可夫采样下、不使用计数型学习率的 $Q$-learning 的几乎必然收敛速率。同时,我们也首次给出了在马尔可夫采样下离策略时序差分学习的集中界。