Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
翻译:现有的自动驾驶交通仿真框架通常依赖于模仿学习或博弈论方法,这些方法求解纳什均衡或粗相关均衡,隐含地假设了完全理性的智能体。然而,人类驾驶员表现出有限理性,在认知和感知约束下做出近似最优的决策。我们提出了EvoQRE,这是一个原则性框架,用于将安全关键的交通交互建模为一般和马尔可夫博弈,并通过量子响应均衡和演化博弈动力学进行求解。EvoQRE将预训练的生成世界模型与熵正则化的复制器动力学相结合,在保持均衡结构的同时捕捉随机的人类行为。我们提供了严格的理论结果,证明了在所提出的动力学下,在弱单调性假设下,通过具有显式收敛速率 O(log k / k^{1/3}) 的双时间尺度随机逼近,系统收敛于Logit-QRE。我们进一步使用基于混合和基于能量的策略表示,将QRE扩展到连续动作空间。在Waymo开放运动数据集和nuPlan基准测试上的实验表明,EvoQRE通过可解释的理性参数,实现了最先进的真实感、改进的安全指标以及可控的多样化安全关键场景生成。