In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标到达强化学习(goal-reaching RL)中,最优值函数具有一种特殊的几何结构,称为准度量(quasimetric)结构。本文提出准度量强化学习(QRL)这一新方法,利用准度量模型学习最优值函数。与以往方法不同,QRL的目标函数专为准度量设计,并具有强大的理论恢复保证。在实证方面,我们基于离散化MountainCar环境进行了深入分析,揭示了QRL的特性及其相较于其他方法的优势。在离线与在线目标到达基准测试中,QRL在基于状态和基于图像的观测下均展现出更优的样本效率与性能。