Existing traffic simulation frameworks for autonomous vehicles typically rely on imitation learning or game-theoretic approaches that solve for Nash or coarse correlated equilibria, implicitly assuming perfectly rational agents. However, human drivers exhibit bounded rationality, making approximately optimal decisions under cognitive and perceptual constraints. We propose EvoQRE, a principled framework for modeling safety-critical traffic interactions as general-sum Markov games solved via Quantal Response Equilibrium (QRE) and evolutionary game dynamics. EvoQRE integrates a pre-trained generative world model with entropy-regularized replicator dynamics, capturing stochastic human behavior while maintaining equilibrium structure. We provide rigorous theoretical results, proving that the proposed dynamics converge to Logit-QRE under a two-timescale stochastic approximation with an explicit convergence rate of O(log k / k^{1/3}) under weak monotonicity assumptions. We further extend QRE to continuous action spaces using mixture-based and energy-based policy representations. Experiments on the Waymo Open Motion Dataset and nuPlan benchmark demonstrate that EvoQRE achieves state-of-the-art realism, improved safety metrics, and controllable generation of diverse safety-critical scenarios through interpretable rationality parameters.
翻译:现有面向自动驾驶车辆的交通仿真框架通常依赖模仿学习或求解纳什均衡与粗相关均衡的博弈论方法,隐含假设智能体具备完全理性。然而人类驾驶员在认知与感知约束下表现出有限理性,仅能做出近似最优决策。我们提出EvoQRE——一种通过量化反应均衡(QRE)与演化博弈动力学建模安全关键交通交互的规范化框架,将交互过程视为一般和马尔可夫博弈。EvoQRE融合了预训练生成世界模型与熵正则化复制子动力学,在保持均衡结构的同时捕获随机人类行为。我们提供了严格的理论结果,证明所提动力学在弱单调性假设下通过双时间尺度随机逼近可收敛至Logit-QRE,且收敛率达O(log k / k^{1/3})。进一步,我们利用混合策略与能量策略表示将QRE扩展至连续动作空间。在Waymo开放运动数据集与nuPlan基准上的实验表明,EvoQRE可实现最先进的仿真真实度、改进的安全性指标,并通过可解释的理性参数可控生成多样化的安全关键场景。