Offline goal-conditioned reinforcement learning (GCRL) learns goal-conditioned policies from static pre-collected datasets. However, accurate value estimation remains a challenge due to the limited coverage of the state-action space. Recent physics-informed approaches have sought to address this by imposing physical and geometric constraints on the value function through regularization defined over first-order partial differential equations (PDEs), such as the Eikonal equation. However, these formulations can often be ill-posed in complex, high-dimensional environments. In this work, we propose a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-Bellman (HJB) equation. By providing a physics-based inductive bias, our approach grounds the learning process in optimal control theory, explicitly regularizing and bounding updates during value iterations. Furthermore, we leverage the Feynman-Kac theorem to recast the PDE solution as an expectation, enabling a tractable Monte Carlo estimation of the objective that avoids numerical instability in higher-order gradients. Experiments demonstrate that our method improves geometric consistency, making it broadly applicable to navigation and high-dimensional, complex manipulation tasks. Open-source codes are available at https://github.com/HrishikeshVish/phys-fk-value-GCRL.
翻译:离线目标条件强化学习(GCRL)从静态预收集的数据集中学习目标条件策略。然而,由于状态-动作空间的覆盖有限,准确的价值估计仍然是一个挑战。最近的物理信息方法试图通过在一阶偏微分方程(PDEs,如Eikonal方程)上定义的正则化,对价值函数施加物理和几何约束来解决这一问题。然而,在复杂的高维环境中,这些公式常常是病态的。在这项工作中,我们提出了一种源自Hamilton-Jacobi-Bellman(HJB)方程粘性解的物理信息正则化。通过提供基于物理的归纳偏置,我们的方法将学习过程建立在最优控制理论基础上,明确地在价值迭代过程中正则化并约束更新。此外,我们利用Feynman-Kac定理将PDE解重新表述为一个期望,从而实现了目标函数的可处理蒙特卡洛估计,避免了高阶梯度中的数值不稳定性。实验表明,我们的方法提高了几何一致性,使其广泛适用于导航以及高维、复杂的操作任务。开源代码可在 https://github.com/HrishikeshVish/phys-fk-value-GCRL 获取。