Motivated by Q-learning, we study nonsmooth contractive stochastic approximation (SA) with constant stepsize. We focus on two important classes of dynamics: 1) nonsmooth contractive SA with additive noise, and 2) synchronous and asynchronous Q-learning, which features both additive and multiplicative noise. For both dynamics, we establish weak convergence of the iterates to a stationary limit distribution in Wasserstein distance. Furthermore, we propose a prelimit coupling technique for establishing steady-state convergence and characterize the limit of the stationary distribution as the stepsize goes to zero. Using this result, we derive that the asymptotic bias of nonsmooth SA is proportional to the square root of the stepsize, which stands in sharp contrast to smooth SA. This bias characterization allows for the use of Richardson-Romberg extrapolation for bias reduction in nonsmooth SA.
翻译:受Q学习的启发,本文研究具有常数步长的非光滑压缩随机逼近(SA)。我们聚焦于两类重要动力学:1)带加性噪声的非光滑压缩SA,以及2)同时包含加性和乘性噪声的同步与非同步Q学习。针对这两类动力学,我们建立了迭代序列在Wasserstein距离下向平稳极限分布的弱收敛性。此外,我们提出一种预极限耦合技术用于建立稳态收敛性,并刻画了当步长趋于零时平稳分布的极限特征。利用这一结果,我们推导出非光滑SA的渐近偏差与步长的平方根成正比——这显著区别于光滑SA的情形。该偏差刻画使得Richardson-Romberg外推法可用于非光滑SA的偏差缩减。