In goal-reaching reinforcement learning (RL), the optimal value function has a particular geometry, called quasimetric structure. This paper introduces Quasimetric Reinforcement Learning (QRL), a new RL method that utilizes quasimetric models to learn optimal value functions. Distinct from prior approaches, the QRL objective is specifically designed for quasimetrics, and provides strong theoretical recovery guarantees. Empirically, we conduct thorough analyses on a discretized MountainCar environment, identifying properties of QRL and its advantages over alternatives. On offline and online goal-reaching benchmarks, QRL also demonstrates improved sample efficiency and performance, across both state-based and image-based observations.
翻译:在目标到达强化学习(goal-reaching RL)中,最优价值函数具有特殊的几何结构,即拟度量(quasimetric)结构。本文提出拟度量强化学习(QRL)方法,这是一种利用拟度量模型学习最优价值函数的新型强化学习方法。与现有方法不同,QRL的目标函数专门针对拟度量结构设计,并具有强理论恢复保证。在实证研究中,我们基于离散化MountainCar环境进行了全面分析,揭示了QRL的特性及其相较于其他方法的优势。在离线和在线目标到达基准测试中,QRL在基于状态和基于图像的观测模式下均展现出更优的样本效率与性能表现。