In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear function approximation. To overcome this limitation, a significant line of research has introduced regularization techniques to ensure stable convergence under function approximation. In this work, we propose a new algorithm, periodic regularized Q-learning (PRQ). We first introduce regularization at the level of the projection operator and explicitly construct a regularized projected value iteration (RP-VI), subsequently extending it to a sample-based RL algorithm. By appropriately regularizing the projection operator, the resulting projected value iteration becomes a contraction. By extending this regularized projection into the stochastic setting, we establish the PRQ algorithm and provide a rigorous theoretical analysis that proves finite-time convergence guarantees for PRQ under linear function approximation.
翻译:在强化学习(RL)中,Q学习是一种基础算法,其在表格化设定下的收敛性已得到保证。然而,在线性函数逼近下,这一收敛保证不再成立。为克服此限制,一个重要研究方向引入了正则化技术,以确保函数逼近下的稳定收敛。本文提出一种新算法——周期性正则化Q学习(PRQ)。我们首先在投影算子层面引入正则化,显式构建了正则化投影值迭代(RP-VI),随后将其扩展为基于样本的强化学习算法。通过对投影算子进行适当正则化,所得投影值迭代成为压缩映射。通过将此正则化投影扩展至随机设定,我们建立了PRQ算法,并提供了严格的理论分析,证明了PRQ在线性函数逼近下的有限时间收敛保证。