Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit distribution in Wasserstein distance. Furthermore, we propose a prelimit coupling technique for establishing steady-state convergence and characterize the limit of the stationary distribution as the stepsize goes to zero. Using this result, we derive that the asymptotic bias of nonsmooth SA is proportional to the square root of the stepsize, which stands in sharp contrast to smooth SA. This bias characterization allows for the use of Richardson-Romberg extrapolation for bias reduction in nonsmooth SA.
翻译:受Q-learning启发,我们研究了具有恒定步长的非光滑收缩随机逼近(SA)。我们聚焦于两类重要动态:1)带加性噪声的非光滑收缩SA,以及2)同时包含加性和乘性噪声的同步与异步Q-learning。针对这两类动态,我们证明了迭代值在Wasserstein距离下弱收敛到稳态极限分布。进一步,我们提出了一种用于建立稳态收敛的预极限耦合技术,并刻画了当步长趋于零时稳态分布的极限。基于此结果,我们推导出非光滑SA的渐近偏差与步长的平方根成正比,这与光滑SA形成鲜明对比。这种偏差刻画使得非光滑SA中可以利用Richardson-Romberg外推法进行偏差缩减。