In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标达到强化学习(RL)中,最优价值函数具有特殊的几何结构,称为准度量结构。本文提出准度量强化学习(QRL)这一新方法,利用准度量模型学习最优价值函数。与现有方法不同,QRL的目标函数专门针对准度量设计,并具备强大的理论恢复保证。在实验层面,我们通过对离散化山地车环境进行深入分析,揭示了QRL的特性及其相对于其他方法的优势。在离线和在线目标达到基准测试中,QRL在基于状态和基于图像的观测下均展现出更高的样本效率和性能。