基于最佳响应映射的数据驱动动态博弈结构分解 (A Data Driven Structural Decomposition of Dynamic Games via Best Response Maps)

Dynamic games are powerful tools to model multi-agent decision-making, yet computing Nash (generalized Nash) equilibria remains a central challenge in such settings. Complexity arises from tightly coupled optimality conditions, nested optimization structures, and poor numerical conditioning. Existing game-theoretic solvers address these challenges by directly solving the joint game, typically requiring explicit modeling of all agents' objective functions and constraints, while learning-based approaches often decouple interaction through prediction or policy approximation, sacrificing equilibrium consistency. This paper introduces a conceptually novel formulation for dynamic games by restructuring the equilibrium computation. Rather than solving a fully coupled game or decoupling agents through prediction or policy approximation, a data-driven structural reduction of the game is proposed that removes nested optimization layers and derivative coupling by embedding an offline-compiled best-response map as a feasibility constraint. Under standard regularity conditions, when the best-response operator is exact, any converged solution of the reduced problem corresponds to a local open-loop Nash (GNE) equilibrium of the original game; with a learned surrogate, the solution is approximately equilibrium-consistent up to the best-response approximation error. The proposed formulation is supported by mathematical proofs, accompanying a large-scale Monte Carlo study in a two-player open-loop dynamic game motivated by the autonomous racing problem. Comparisons are made against state-of-the-art joint game solvers, and results are reported on solution quality, computational cost, and constraint satisfaction.

翻译：动态博弈是建模多智能体决策的有力工具，然而计算纳什（广义纳什）均衡在此类场景中仍是一个核心挑战。复杂性源于紧密耦合的最优性条件、嵌套的优化结构以及不良的数值条件。现有的博弈论求解器通过直接求解联合博弈来应对这些挑战，通常需要显式建模所有智能体的目标函数和约束，而基于学习的方法则常通过预测或策略近似来解耦交互，牺牲了均衡一致性。本文通过重构均衡计算，为动态博弈引入了一种概念上新颖的表述。该方法既不求解完全耦合的博弈，也不通过预测或策略近似来解耦智能体，而是提出了一种数据驱动的博弈结构约简，通过将离线编译的最佳响应映射嵌入为可行性约束，从而移除了嵌套的优化层和导数耦合。在标准正则性条件下，当最佳响应算子精确时，约简问题的任何收敛解都对应于原博弈的一个局部开环纳什（广义纳什）均衡；当使用学习得到的替代模型时，其解在最佳响应近似误差范围内近似满足均衡一致性。所提出的表述得到了数学证明的支持，并辅以一项受自动驾驶赛车问题启发的双玩家开环动态博弈中的大规模蒙特卡洛研究。研究结果与最先进的联合博弈求解器进行了比较，并在求解质量、计算成本和约束满足度方面进行了报告。