Theory and application of stochastic approximation (SA) has grown within the control systems community since the earliest days of adaptive control. This paper takes a new look at the topic, motivated by recent results establishing remarkable performance of SA with (sufficiently small) constant step-size $\alpha>0$. If averaging is implemented to obtain the final parameter estimate, then the estimates are asymptotically unbiased with nearly optimal asymptotic covariance. These results have been obtained for random linear SA recursions with i.i.d. coefficients. This paper obtains very different conclusions in the more common case of geometrically ergodic Markovian disturbance: (i) The $\textit{target bias}$ is identified, even in the case of non-linear SA, and is in general non-zero. The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within $O(\alpha)$ of optimal. The error term is identified, and may be massive if mean dynamics are not well conditioned. The theory is illustrated with application to TD-learning.
翻译:随机逼近理论与应用自自适应控制早期以来便在控制系统中不断发展。本文受近期研究结果启发重新审视该主题,这些结果表明使用(足够小的)恒定步长$\alpha>0$的随机逼近具有卓越性能。若通过平均化获取最终参数估计量,则估计量具有渐进无偏性且接近最优渐近协方差。这些结论针对具有独立同分布系数的随机线性随机逼近递归获得。本文在更常见的几何遍历马氏扰动情形下得出截然不同的结论:(i) 识别了$\textit{目标偏差}$,即使对于非线性随机逼近也成立,且该偏差通常非零。其余结论针对线性随机逼近递归建立:(ii) 双变量参数-扰动过程在拓扑意义上具有几何遍历性;(iii) 该情形下偏差表示形式更简洁,若存在乘性噪声则不能期望偏差为零;(iv) 平均化参数的渐近协方差在$O(\alpha)$范围内接近最优。误差项已被识别,若均值动力学条件不佳则可能显著偏大。该理论通过应用于TD学习进行验证。