Offline Goal-Conditioned Reinforcement Learning seeks to train agents to reach specified goals from previously collected trajectories. Scaling that promises to long-horizon tasks remains challenging, notably due to compounding value-estimation errors. Principled geometric offers a potential solution to address these issues. Following this insight, we introduce Projective Quasimetric Planning (ProQ), a compositional framework that learns an asymmetric distance and then repurposes it, firstly as a repulsive energy forcing a sparse set of keypoints to uniformly spread over the learned latent space, and secondly as a structured directional cost guiding towards proximal sub-goals. In particular, ProQ couples this geometry with a Lagrangian out-of-distribution detector to ensure the learned keypoints stay within reachable areas. By unifying metric learning, keypoint coverage, and goal-conditioned control, our approach produces meaningful sub-goals and robustly drives long-horizon goal-reaching on diverse a navigation benchmarks.
翻译:离线目标条件强化学习旨在训练智能体根据先前收集的轨迹达到指定目标。将其扩展到长时程任务仍然具有挑战性,主要原因是价值估计误差的累积。基于几何原理的方法为解决这些问题提供了潜在的方案。基于这一见解,我们提出了投影拟度量规划(ProQ),这是一个组合式框架,它首先学习一个非对称距离,然后将其重新利用:一方面作为排斥能量,迫使一组稀疏的关键点在学习的潜在空间中均匀分布;另一方面作为结构化方向性成本,引导智能体朝向近端子目标。特别地,ProQ将这种几何结构与拉格朗日分布外检测器相结合,以确保学习到的关键点保持在可达区域内。通过统一度量学习、关键点覆盖和目标条件控制,我们的方法能够生成有意义的子目标,并在多种导航基准测试中稳健地驱动长时程目标达成。