The values of two-player general-sum differential games are viscosity solutions to Hamilton-Jacobi-Isaacs (HJI) equations. Value and policy approximations for such games suffer from the curse of dimensionality (CoD). Alleviating CoD through physics-informed neural networks (PINN) encounters convergence issues when differentiable values with large Lipschitz constants are present due to state constraints. On top of these challenges, it is often necessary to learn generalizable values and policies across a parametric space of games, e.g., for game parameter inference when information is incomplete. To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms the current state-of-the-art hybrid PINN model on safety performance across games with parametric state constraints. Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts, which are computationally cheap. We show that the costate dynamics, which can reflect state constraint violation, effectively enables the learning of differentiable values with large Lipschitz constants, without requiring manually supervised data as suggested by the hybrid PINN model. More importantly, we show that the close relationship between costates and policies makes the former critical in learning feedback control policies with generalizable safety performance.
翻译:双人一般和微分博弈的值函数是 Hamilton-Jacobi-Isaacs (HJI) 方程的黏性解。此类博弈的值函数与策略逼近受维度灾难 (CoD) 制约。通过物理信息神经网络 (PINN) 缓解维度灾难时,若存在因状态约束导致具有大 Lipschitz 常数的可微值函数,则会出现收敛问题。除这些挑战外,通常还需在博弈的参数空间中学习可泛化的值函数与策略,例如在信息不完全时进行博弈参数推断。为解决这些问题,本文提出一种庞特里亚金模式神经算子,其在具有参数化状态约束的博弈安全性能方面优于当前最先进的混合 PINN 模型。我们的核心贡献是引入了基于前向与后向协态展开间差异定义的协态损失函数,该计算过程成本低廉。研究表明,能够反映状态约束违反情况的协态动力学,可有效促进具有大 Lipschitz 常数的可微值函数学习,且无需如混合 PINN 模型建议那样依赖人工监督数据。更重要的是,我们论证了协态与策略间的紧密关联性,使得协态对学习具有可泛化安全性能的反馈控制策略至关重要。