As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, a type of non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multimodal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines.
翻译:随着诸如自动驾驶汽车等智能机器人越来越多地部署在人类环境中,这些系统应在多大程度上依赖基于模型的博弈论规划器与数据驱动策略,以实现安全且具有交互意识的运动规划,仍是一个待解决的问题。现有的动态博弈模型假设所有智能体都受任务驱动并以最优方式行动。然而,现实中人类往往会偏离这些模型所规定的决策,其行为更适合在噪声理性范式下进行近似。在本工作中,我们探究了一种将数据驱动参考策略与基于优化的博弈论策略相结合的原则性方法。我们提出了 KLGame——一种带有相对于通用、随机且可能多模态参考策略的库尔贝克-莱布勒散度正则化的非合作动态博弈。该方法为每位决策者引入了一个可调参数,允许在任务驱动与数据驱动行为之间进行调节。我们提出了一种高效算法,用于实时计算 KLGame 的多模态近似反馈纳什均衡策略。通过一系列仿真与实际自动驾驶场景实验,我们证明:相较于非正则化基线,KLGame 策略能更有效地融合参考策略的指导,并解释噪声理性的人类行为。