We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data. Viewing the joint process of the data and LSA iterate as a time-homogeneous Markov chain, we prove its convergence to a unique limiting and stationary distribution in Wasserstein distance and establish non-asymptotic, geometric convergence rates. Furthermore, we show that the bias vector of this limit admits an infinite series expansion with respect to the stepsize. Consequently, the bias is proportional to the stepsize up to higher order terms. This result stands in contrast with LSA under i.i.d. data, for which the bias vanishes. In the reversible chain setting, we provide a general characterization of the relationship between the bias and the mixing time of the Markovian data, establishing that they are roughly proportional to each other. While Polyak-Ruppert tail-averaging reduces the variance of the LSA iterates, it does not affect the bias. The above characterization allows us to show that the bias can be reduced using Richardson-Romberg extrapolation with $m\ge 2$ stepsizes, which eliminates the $m-1$ leading terms in the bias expansion. This extrapolation scheme leads to an exponentially smaller bias and an improved mean squared error, both in theory and empirically. Our results immediately apply to the Temporal Difference learning algorithm with linear function approximation, Markovian data, and constant stepsizes.
翻译:摘要:我们考虑具有常数步长和马尔可夫数据的线性随机逼近(LSA)。将数据和LSA迭代的联合过程视为时间齐次马尔可夫链,我们证明了其在Wasserstein距离下收敛到唯一的极限平稳分布,并建立了非渐近的几何收敛速率。此外,我们证明该极限的偏差向量关于步长具有无穷级数展开形式。因此,偏差与步长成正比,直至高阶项。这一结果与独立同分布数据下的LSA形成鲜明对比,后者偏差为零。在可逆链设定中,我们给出了偏差与马尔可夫数据混合时间之间关系的一般刻画,证明两者大致成正比。尽管Polyak-Ruppert尾缀平均能降低LSA迭代的方差,但不影响偏差。上述刻画使我们能够证明,使用$m\ge 2$个步长的Richardson-Romberg外推可消除偏差展开中的前$m-1$个主导项,从而减小偏差。该外推方案在理论和经验上均可实现指数级更小的偏差和更优的均方误差。我们的结果可直接应用于线性函数逼近、马尔可夫数据和常数步长下的时序差分学习算法。