Goal-Conditioned Reinforcement Learning (GCRL) mitigates the difficulty of reward design by framing tasks as goal reaching rather than maximizing hand-crafted reward signals. In this setting, the optimal goal-conditioned value function naturally forms a quasimetric, motivating Quasimetric RL (QRL), which constrains value learning to quasimetric mappings and enforces local consistency through discrete, trajectory-based constraints. We propose Eikonal-Constrained Quasimetric RL (Eik-QRL), a continuous-time reformulation of QRL based on the Eikonal Partial Differential Equation (PDE). This PDE-based structure makes Eik-QRL trajectory-free, requiring only sampled states and goals, while improving out-of-distribution generalization. We provide theoretical guarantees for Eik-QRL and identify limitations that arise under complex dynamics. To address these challenges, we introduce Eik-Hierarchical QRL (Eik-HiQRL), which integrates Eik-QRL into a hierarchical decomposition. Empirically, Eik-HiQRL achieves state-of-the-art performance in offline goal-conditioned navigation and yields consistent gains over QRL in manipulation tasks, matching temporal-difference methods.
翻译:目标条件强化学习(GCRL)通过将任务框架设定为目标达成而非最大化人工设计的奖励信号,缓解了奖励设计的困难。在此设定下,最优目标条件价值函数自然地构成一个拟度量,这启发了拟度量强化学习(QRL)。QRL将价值学习约束为拟度量映射,并通过基于离散轨迹的约束来强制局部一致性。我们提出了Eikonal约束的拟度量强化学习(Eik-QRL),这是一种基于Eikonal偏微分方程(PDE)的QRL连续时间重构。这种基于PDE的结构使得Eik-QRL无需轨迹,仅需采样状态和目标,同时提升了分布外泛化能力。我们为Eik-QRL提供了理论保证,并指出了其在复杂动力学下出现的局限性。为了应对这些挑战,我们引入了Eik分层QRL(Eik-HiQRL),它将Eik-QRL集成到一个分层分解中。实验表明,Eik-HiQRL在离线目标条件导航任务中取得了最先进的性能,并在操作任务中相比QRL获得了持续的增益,其表现与时间差分方法相当。