Blending Data-Driven Priors in Dynamic Games

Justin Lidard,Haimin Hu,Asher Hancock,Zixu Zhang,Albert Gimó Contreras,Vikash Modi,Jonathan DeCastro,Deepak Gopinath,Guy Rosman,Naomi Ehrich Leonard,María Santos,Jaime Fernández Fisac

from arxiv, 20 pages, 12 figures

As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/.

翻译：随着自动驾驶车辆等智能机器人越来越多地部署在人类环境中，这些系统应在多大程度上利用基于模型的博弈论规划器与数据驱动策略，以实现安全且具备交互意识的运动规划，仍是一个悬而未决的问题。现有的动态博弈模型通常假设所有智能体均以任务为导向并采取最优行为。然而，现实中人类行为往往会偏离这些模型所预测的决策，其行为在噪声理性范式下能得到更好的近似。本研究探索了一种将数据驱动参考策略与基于优化的博弈论策略相融合的原则性方法。我们提出了KLGame算法，用于求解具有Kullback-Leibler（KL）正则化的非合作动态博弈问题，该正则化针对一个通用的、随机的、可能具有多模态特性的参考策略。我们的方法为每个决策者引入了一个可调参数，允许在任务驱动与数据驱动行为之间进行调节。我们提出了一种高效算法，用于实时计算KLGame的多模态近似反馈纳什均衡策略。通过一系列仿真和真实世界自动驾驶场景的实验，我们证明相较于未正则化的基线方法，KLGame策略能更有效地融合参考策略的引导，并更好地处理噪声理性的人类行为。更多信息、视频及代码请访问网站：https://kl-games.github.io/。